Hello, To my knoweldge, the character position of the tokens is not preserved by Lucene - only the ordinal postion of token's within a document / field is preserved. Thus you need to store this character offset information separately, say, as Payload data.
best, C>T> On Fri, Feb 26, 2010 at 3:41 PM, Christopher Condit <con...@sdsc.edu> wrote: > I'm trying to store semantic information in payloads at index time. I > believe this part is successful - but I'm having trouble getting access to > the payload locations after the index is created. I'd like to know the > offset in the original text for the token with the payload - and get this > information for all payloads that are set in a Field even if they don't > relate to the query. I tried (from the highlighting filter): > TokenStream tokens = TokenSources.getTokenStream(reader, 0, "body"); > while (tokens.incrementToken()) { > TermAttribute term = tokens.getAttribute(TermAttribute.class); > if (toker.hasAttribute(PayloadAttribute.class)) { > PayloadAttribute payload = > tokens.getAttribute(PayloadAttribute.class); > OffsetAttribute offset = toker.getAttribute(OffsetAttribute.class); > } > } > But the OffsetAttribute never seems to contain any information. > In my token filter do I need to do more than: > offsetAtt = addAttribute(OffsetAttribute.class); > during construction in order to store Offset information? > > Thanks, > -Chris > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- TH!NKMAP Christopher Tignor | Senior Software Architect 155 Spring Street NY, NY 10012 p.212-285-8600 x385 f.212-285-8999