On May 31, 2006, at 6:53 AM, Nadav Har'El wrote:
I think the suggestion for position-specific boost is not enough,
and what is really be needed is a more general "payload" mechanism,
that allows storing with each position a variable length payload
(byte[]) which the application can use for its purposes.
Would the payload be inserted per-termdoc or per-posting (i.e. per-
position)?
One possible application of this scheme is order-by-date: stuff a
numeric representation of a date into each termdoc. That would
consume an awful lot of index space, but it would make returning
returning documents within a range of dates very fast.
Another possibility was raised by Grant in <http://wiki.apache.org/
jakarta-lucene/ConversationsBetweenDougMarvinAndGrant>: storing part-
of-speech along with position.
Doug described arbitrary extensibility via a per-Field codec here:
<http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200605.mbox/
[EMAIL PROTECTED]>. I have to say, I find the idea
of a pluggable posting format enticing.
Adding payloads is actually not difficult, but would require a change
to the index file format (probably the positions file)
In my view, the positions file, the freqs file, and the norms should
all be merged into one, a la Google98[1]. However, an interesting
wrinkle here is, if positions are optional, at what point in the term-
dictionary do you start applying the new decoder? Perhaps we'd need
one postings file per indexed field, or one file per codec.
Another, related, improvement, I think, should be to make positions
optional for certain fields.
Why stop there? Norms/boosts are currently optional. Why not make
freqs optional as well? That's the current state of the proposal at
<http://wiki.apache.org/jakarta-lucene/FlexibleIndexing>.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
[1] Brin/Page: "The Anatomy of a Large-Scale Hypertextual Web Search
Engine" <http://dbpubs.stanford.edu:8090/pub/1998-8>.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]