Also, one nice optimization we could do with the "term number column-
stride array" is do bit packing (borrowing from the PFOR code)
dynamically.
Ie since we know there are X unique terms in this segment, when
populating the array that maps docID to term number we could use
exactly the right number of bits. Enumerated fields with not many
unique values (eg, country, state) would take relatively little RAM.
With LUCENE-1231, where the fields are stored column stride on disk,
we could do this packing during index such that loading at search time
is very fast.
Mike
Paul Elschot wrote:
Op Tuesday 11 November 2008 11:29:27 schreef Michael McCandless:
The other part of your proposal was to somehow "number" term text
such that term range comparisons can be implemented fast int
comparison.
...
http://fontoura.org/papers/paramsearch.pdf
However that'd be quite a bit deeper change to Lucene.
The cheap version is hierarchical prefixing here:
http://wiki.apache.org/jakarta-lucene/DateRangeQueries
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]