Re: Retrieve nearest token based off location in original Text

John Paul Sondag Fri, 06 Jul 2007 10:51:03 -0700

I thought that went to the "index" of the token.  I may not understand it
completely but this is how I currently view the TokenStream


For example if my text was the following:

This is an Example

This is index of 1, is has index 2, an has index 3 Example has index 4.
What I have is the actual "character position" in the original text.  "This"
is characters 0-3, "is" is characters 5-6, "an" is characters 8-9, and
"Example" is characters 11-17.  I know that given Token 4 (Example) I can
get the startOffset and endOffset (11, and 17).  What I'm wondering is given
character offset can I get a tokenIndex.  (I.E.  given character offset 12,
it would return 3, because Example is the closest token that starts at
character 12).

--JP

On 7/6/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: I never got a response to this and thought maybe I was too wordy.
:
: I'm wondering if there's a way where given a position in the original
text
: you can retrieve the token index that is nearest to that position using
the
: StandardToken/StandardTokenizer classes?

i may not be understanding the question, but wouldn't that just be...

  TokenStream s = getTokenStreamForOrriginalText()
  Token t;
  for (i=0; i<thePositionYouKnow; i++) {
    t = s.next();
  }
  return t;

?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Retrieve nearest token based off location in original Text

Reply via email to