On 12/22/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
Precision would be enhanced if boolean scoring took position into account, and could be further enhanced if each position were assigned a boost. For that purpose, having everything in one file is an advantage, as it cuts down disk seeks. Turn off freqs, positions, and boosts, and you have only doc_nums, which is ideal for matching rather than scoring, yielding a performance gain.
I'm aware of this design. Boolean and phrase queries are an example. The point is, there are different queries whose processing will (continue to) require different information of terms, especially when flexible posting is allowed. The question is, should the number of files used to store postings be customizable? Cheers, Ning --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]