I thought that went to the "index" of the token. I may not understand it
completely but this is how I currently view the TokenStream
For example if my text was the following:
This is an Example
This is index of 1, is has index 2, an has index 3 Example has index 4.
What I have is the actual "character position" in the original text. "This"
is characters 0-3, "is" is characters 5-6, "an" is characters 8-9, and
"Example" is characters 11-17. I know that given Token 4 (Example) I can
get the startOffset and endOffset (11, and 17). What I'm wondering is given
character offset can I get a tokenIndex. (I.E. given character offset 12,
it would return 3, because Example is the closest token that starts at
character 12).
--JP
On 7/6/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: I never got a response to this and thought maybe I was too wordy.
:
: I'm wondering if there's a way where given a position in the original
text
: you can retrieve the token index that is nearest to that position using
the
: StandardToken/StandardTokenizer classes?
i may not be understanding the question, but wouldn't that just be...
TokenStream s = getTokenStreamForOrriginalText()
Token t;
for (i=0; i<thePositionYouKnow; i++) {
t = s.next();
}
return t;
?
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]