Re: Term numbering and range filtering

Michael McCandless Tue, 11 Nov 2008 12:56:23 -0800

Also, one nice optimization we could do with the "term number column-stride array" is do bit packing (borrowing from the PFOR code)dynamically.

Ie since we know there are X unique terms in this segment, whenpopulating the array that maps docID to term number we could useexactly the right number of bits. Enumerated fields with not manyunique values (eg, country, state) would take relatively little RAM.With LUCENE-1231, where the fields are stored column stride on disk,we could do this packing during index such that loading at search timeis very fast.


Mike

Paul Elschot wrote:

Op Tuesday 11 November 2008 11:29:27 schreef Michael McCandless:


The other part of your proposal was to somehow "number" term text
such that term range comparisons can be implemented fast int
comparison.

...


  http://fontoura.org/papers/paramsearch.pdf

However that'd be quite a bit deeper change to Lucene.


The cheap version is hierarchical prefixing here:

http://wiki.apache.org/jakarta-lucene/DateRangeQueries

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Term numbering and range filtering

Reply via email to