Re: Sorting posting lists before intersection

Renaud Delbru Mon, 13 Oct 2008 07:23:18 -0700

Hi Andrzej,

sorry for the late reply.

I have looked at the code. As far as I understand, you sort the postinglists based on the first doc skip. The first posting list will be theone who have the first biggest document skip.Do the sparseness of posting lists is a good predictor for sampling andordering posting lists ? Do you know evaluation of such technique ?

In order to implement sorting based on frequency, we need the documentfrequency of each term. This information should be propagated throughthe Scorer classes (from TermScorer to higher level class such asConjunctiveScorer). This will require a call toIndexReader.docFreq(term) for each of the term queries. Is docFreq callmean another IO access ?


Thanks for the clarification,
Regards.
--
Renaud Delbru


Andrzej Bialecki wrote:


Renaud Delbru wrote:
> Hi all,
>
> I am wondering if Lucene implements the query optimisation that consists
> of ordering the posting lists based on the term frequency before
> intersection ?
> If yes, could somebody point me to the java class / method that
> implements such strategy ?

Lucene trunk: ConjunctionScorer, lines 85-103 - pay attention to the
comments there, it's not strictly a sort by frequency, rather by the
sampled "sparseness".

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sorting posting lists before intersection

Reply via email to