On Dec 21, 2006, at 1:58 PM, Ning Li wrote:

Storing all the posting content, e.g. frequencies and positions, in a
single file greatly simplifies things. However, this could cause some
performance penalty. For example, boolean query 'Apache AND Lucene'
would have to paw through positions. But position indexing for Apache
and Lucene is necessary to support phrase query '"Apache Lucene"'.

Precision would be enhanced if boolean scoring took position into account, and could be further enhanced if each position were assigned a boost. For that purpose, having everything in one file is an advantage, as it cuts down disk seeks. Turn off freqs, positions, and boosts, and you have only doc_nums, which is ideal for matching rather than scoring, yielding a performance gain.

What's being considered doesn't really speak to the motivation of improving existing core functionality, though. It's more about expanding the API to allow new applications.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to