On May 31, 2006, at 6:53 AM, Nadav Har'El wrote:

I think the suggestion for position-specific boost is not enough,
and what is really be needed is a more general "payload" mechanism,
that allows storing with each position a variable length payload
(byte[]) which the application can use for its purposes.

Would the payload be inserted per-termdoc or per-posting (i.e. per- position)?

One possible application of this scheme is order-by-date: stuff a numeric representation of a date into each termdoc. That would consume an awful lot of index space, but it would make returning returning documents within a range of dates very fast.

Another possibility was raised by Grant in <http://wiki.apache.org/ jakarta-lucene/ConversationsBetweenDougMarvinAndGrant>: storing part- of-speech along with position.

Doug described arbitrary extensibility via a per-Field codec here: <http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200605.mbox/ [EMAIL PROTECTED]>. I have to say, I find the idea of a pluggable posting format enticing.

Adding payloads is actually not difficult, but would require a change
to the index file format (probably the positions file)

In my view, the positions file, the freqs file, and the norms should all be merged into one, a la Google98[1]. However, an interesting wrinkle here is, if positions are optional, at what point in the term- dictionary do you start applying the new decoder? Perhaps we'd need one postings file per indexed field, or one file per codec.

Another, related, improvement, I think, should be to make positions
optional for certain fields.

Why stop there? Norms/boosts are currently optional. Why not make freqs optional as well? That's the current state of the proposal at <http://wiki.apache.org/jakarta-lucene/FlexibleIndexing>.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

[1] Brin/Page: "The Anatomy of a Large-Scale Hypertextual Web Search Engine" <http://dbpubs.stanford.edu:8090/pub/1998-8>.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to