Re: what is the offsets and payload in DocsAndPositionsEnum for ??

Michael McCandless Sun, 18 Nov 2012 16:54:25 -0800

On Sun, Nov 18, 2012 at 12:09 PM, wgggfiy <[email protected]> wrote:
> I'm now studying lucene 4.0.
> 1, what is the startOffset and endOffset for ? is there a code example ?


These are set by the analyzer, to the start and end character offset
for this token (using the OffsetAttribute).  The offsets are used for
highlighting.

> 2, what is payload ? I know just a little about it, and it can be used for
> things like font weight, or XML enclosing tag.

It's an arbitrary per-token-position byte[] that you set during
analysis (using the PayloadAttribute).

> 3, I have a item like (lucene, 350, 450, 33.2, 2), where 350,450 is the
> offset of the term 'lucene', and 33.2 is a score, and 2 is some id, my
> question is how I can make it indexed ?
> my first idea is to relized my own posting list format, but is it possible
> to make it with the startOffset, endOffset and payload ?

You should probably encode them all into the payload; Lucene requires
that the offsets are "in order".

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: what is the offsets and payload in DocsAndPositionsEnum for ??

Reply via email to