Token character positions

Christopher Tignor Tue, 17 Nov 2009 07:37:36 -0800

Hello,

Hoping someone might clear up a question for me:


When Tokenizing we provide the start and end character offsets for each
token locating it within the source text.

If I tokenize the text "word" and then serach for the term "word" in the
same field, how can I recover this character offset information in the
matching documents to precisely locate the word?  I have been storing this
character info myself using payload data but if lucene stores it, then I am
doing so needlessly.  If recovering this character offset info isn't
possible, what is this charcter offset info used for?

thanks so much,

C>T>

-- 
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Token character positions

Reply via email to