Hi, On Thu, Jun 13, 2013 at 8:24 AM, Denis Bazhenov <dot...@gmail.com> wrote: > Document id on the index level is offset of the document in the index. It can > change over time for the same document, for example when merging several > segments. They are also stored in order in posting lists. This allows fast > posting list intersection. Some Lucene API's explicitly state that they > operate on the document ids in order (like TermDocs), some allows out of > order processing (like Collector). So it really depends. > > In case of SortingAtomicReader, as far as I know, it calculate document > permutation, which allows to have sorted docIDs on the output. So, it > basically relabel documents.
This is correct. The org.apache.lucene.index.sorter.Sorter.sort method computes a permutation of the doc IDs which makes doc IDs sorted according to the sort order. SortingAtomicReader is just a view over an AtomicReader which uses this permutation to relabel doc IDs and give the impression that the index is sorted. But this class is not very interesting by itself can can be very slow to decode postings: for each term it needs to load all postings into memory and sort them before returning an enumeration of the doc IDs (see the SortingDocsEnum class), it is only useful to sort indices offline with IndexWriter.addIndexes or online with SortingMergePolicy. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org