Re: Sorting posting lists before intersection

Renaud Delbru Mon, 13 Oct 2008 08:48:00 -0700

Andrzej Bialecki wrote:

Renaud Delbru wrote:
Hi Andrzej,
sorry for the late reply.
I have looked at the code. As far as I understand, you sort theposting lists based on the first doc skip. The first posting listwill be the one who have the first biggest document skip.Do the sparseness of posting lists is a good predictor for samplingand ordering posting lists ? Do you know evaluation of such technique ?
It is _some_ predictor ... :) whether it's a good one is anotherquestion. It's certainly very inexpensive - we don't do any additionalIO except what we have to do anyway, which is scorer.skipTo().
In general case it's costly to calculate the frequency (or sparseness)of matches in a scorer without actually running the scorer through allits matches.

You can estimate the frequency for some scorers, such asConjunctiveScorer, DisjunctiveScorer, etc., as Paul Eschot explained inthe other reply.

Answering your question: docFreq call uses TermInfo information, whichuses a small RAM cache. If you're lucky then it won't cause any IO,otherwise it needs to read this info from the .ti file.

Thanks for the clarification.

If we assume that a query will be composed of few terms, this willrequire, in the worst case, one IO access per term. I think the cost ofthe additional IO access can be balanced by the better prediction thatgives the frequency. This is something to benchmark / evaluate.


Regards
--
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sorting posting lists before intersection

Reply via email to