Re: posting list traversal code

Adrien Grand Thu, 13 Jun 2013 00:21:35 -0700

Hi,

On Thu, Jun 13, 2013 at 8:24 AM, Denis Bazhenov <dot...@gmail.com> wrote:
> Document id on the index level is offset of the document in the index. It can 
> change over time for the same document, for example when merging several 
> segments. They are also stored in order in posting lists. This allows fast 
> posting list intersection. Some Lucene API's explicitly state that they 
> operate on the document ids in order (like TermDocs), some allows out of 
> order processing (like Collector). So it really depends.
>
> In case of SortingAtomicReader, as far as I know, it calculate document 
> permutation, which allows to have sorted docIDs on the output. So, it 
> basically relabel documents.


This is correct. The org.apache.lucene.index.sorter.Sorter.sort method
computes a permutation of the doc IDs which makes doc IDs sorted
according to the sort order. SortingAtomicReader is just a view over
an AtomicReader which uses this permutation to relabel doc IDs and
give the impression that the index is sorted. But this class is not
very interesting by itself can can be very slow to decode postings:
for each term it needs to load all postings into memory and sort them
before returning an enumeration of the doc IDs (see the
SortingDocsEnum class), it is only useful to sort indices offline with
IndexWriter.addIndexes or online with SortingMergePolicy.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: posting list traversal code

Reply via email to