Hello,

To my knoweldge, the character position of the tokens is not preserved by
Lucene - only the ordinal postion of token's within a document / field is
preserved.  Thus you need to store this character offset information
separately, say, as Payload data.

best,

C>T>

On Fri, Feb 26, 2010 at 3:41 PM, Christopher Condit <con...@sdsc.edu> wrote:

> I'm trying to store semantic information in payloads at index time. I
> believe this part is successful - but I'm having trouble getting access to
> the payload locations after the index is created. I'd like to know the
> offset in the original text for the token with the payload - and get this
> information for all payloads that are set in a Field even if they don't
> relate to the query. I tried (from the highlighting filter):
> TokenStream tokens = TokenSources.getTokenStream(reader, 0, "body");
>  while (tokens.incrementToken()) {
>    TermAttribute term = tokens.getAttribute(TermAttribute.class);
>    if (toker.hasAttribute(PayloadAttribute.class)) {
>      PayloadAttribute payload =
> tokens.getAttribute(PayloadAttribute.class);
>      OffsetAttribute offset = toker.getAttribute(OffsetAttribute.class);
>    }
>  }
> But the OffsetAttribute never seems to contain any information.
> In my token filter do I need to do more than:
> offsetAtt = addAttribute(OffsetAttribute.class);
> during construction in order to store Offset information?
>
> Thanks,
> -Chris
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Reply via email to