Re: Getting position increments directly from the the index

Michael McCandless Thu, 23 May 2013 07:40:22 -0700

On Thu, May 23, 2013 at 9:54 AM, Igor Shalyminov
<[email protected]> wrote:


> But, just to clarify, is there a way to get, let's say, a vector of position 
> increments directly from the index, without re-parsing document contents?

Term vectors (as Jack suggested) are one option, but they are very
heavy (slows down indexing, takes lots of disk space, slow
(seek-per-document) to load at search time).

You can enumerate all positions for each termXdoc in the postings, but
you'd then need to collate by document to get the max position (last
term) for that document.  I guess an int[maxDoc] would do the trick,
then walk that array dividing each maxPosition by 1000.  Or index the
sentence token :)

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Getting position increments directly from the the index

Reply via email to