s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1423606084
########## lucene/core/src/java/org/apache/lucene/index/IndexingChain.java: ########## @@ -219,15 +222,33 @@ private Sorter.DocMap maybeSortSegment(SegmentWriteState state) throws IOExcepti } LeafReader docValuesReader = getDocValuesLeafReader(); - + Function<IndexSorter.DocComparator, IndexSorter.DocComparator> comparatorWrapper = in -> in; + + if (state.segmentInfo.getHasBlocks() && indexSort.getParentField() != null) { + final DocIdSetIterator readerValues = + docValuesReader.getNumericDocValues(indexSort.getParentField()); + BitSet parents = BitSet.of(readerValues, state.segmentInfo.maxDoc()); + comparatorWrapper = + in -> + (docID1, docID2) -> + in.compare(parents.nextSetBit(docID1), parents.nextSetBit(docID2)); + } + assert state.segmentInfo.getHasBlocks() == false + || indexSort.getParentField() != null + || indexCreatedVersionMajor < Version.LUCENE_10_0_0.major + : "parent field is not set but the index has blocks. indexCreatedVersionMajor: " + + indexCreatedVersionMajor; List<IndexSorter.DocComparator> comparators = new ArrayList<>(); for (int i = 0; i < indexSort.getSort().length; i++) { SortField sortField = indexSort.getSort()[i]; IndexSorter sorter = sortField.getIndexSorter(); if (sorter == null) { throw new UnsupportedOperationException("Cannot sort index using sort field " + sortField); } - comparators.add(sorter.getDocComparator(docValuesReader, state.segmentInfo.maxDoc())); + + IndexSorter.DocComparator docComparator = Review Comment: @msokolov This is basically what I had in my first version or this. There are a couple of issues with this: - we can't execute arbitrary queries as a sort supplier since the datastructures inside DWPT don't support this - in-fact we can only really access DV in such a fashion, we would likely be able with a non-trivial amount of work to walk a postinglist but executing a query ie. have a IndexReader on top of DWPT would be a lot of work. - a custom comparator also needs a field, a type etc. that is more to configure and store in the index than just a field name that IW fully controls it's type and content. I am still under the impression that you think this change dictates a type and name of a parent field for the application that uses Lucene. It's not. You can think of this as a purely internal field. You don't have to use it for you application or to model you block structure. It only marks the end of the block such that sort doesn't break it. It's basically an index level guarantee to the API guarantee we provide. We do not need to model sub-blocks here since the order of the docs must not be changed also not by a sort? If it needs to be sorted then just within the block and that can / should happen before it's passed to the IW? I hope this makes sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org