mikemccand commented on a change in pull request #442: URL: https://github.com/apache/lucene/pull/442#discussion_r750653939
########## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java ########## @@ -130,15 +125,49 @@ private void initParents(IndexReader reader, int first) throws IOException { return; } + if (tryLoadParentUsingTermPosition(reader, first)) { + // The parent array is already loaded + return; + } + + for (LeafReaderContext leafContext : reader.leaves()) { + int leafDocNum = leafContext.reader().maxDoc(); + if (leafContext.docBase + leafDocNum <= first) { Review comment: We can do this because the taxo index ensures that new ordinals are appended into new segments at the end of the index, on each refresh? E.g. it's using `LogDocMergePolicy` or `LogByteSizeMergePolicy`? ########## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java ########## @@ -130,15 +125,49 @@ private void initParents(IndexReader reader, int first) throws IOException { return; } + if (tryLoadParentUsingTermPosition(reader, first)) { + // The parent array is already loaded + return; + } + + for (LeafReaderContext leafContext : reader.leaves()) { + int leafDocNum = leafContext.reader().maxDoc(); + if (leafContext.docBase + leafDocNum <= first) { + // skip this leaf if it does not contain new categories + continue; + } + NumericDocValues parentValues = + leafContext.reader().getNumericDocValues(Consts.FIELD_PARENT_ORDINAL_NDV); Review comment: Maybe also `null`-check `parentValues` and throw `CorruptIndexException` if so? ########## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java ########## @@ -130,15 +125,49 @@ private void initParents(IndexReader reader, int first) throws IOException { return; } + if (tryLoadParentUsingTermPosition(reader, first)) { Review comment: Could we maybe make this more explicit? Check the created version from `SegmentInfos`? ########## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java ########## @@ -130,15 +125,49 @@ private void initParents(IndexReader reader, int first) throws IOException { return; } + if (tryLoadParentUsingTermPosition(reader, first)) { + // The parent array is already loaded + return; + } + + for (LeafReaderContext leafContext : reader.leaves()) { + int leafDocNum = leafContext.reader().maxDoc(); + if (leafContext.docBase + leafDocNum <= first) { + // skip this leaf if it does not contain new categories + continue; + } + NumericDocValues parentValues = + leafContext.reader().getNumericDocValues(Consts.FIELD_PARENT_ORDINAL_NDV); + for (int doc = Math.max(first - leafContext.docBase, 0); doc < leafDocNum; doc++) { + if (parentValues.advanceExact(doc) == false) { + throw new CorruptIndexException( + "Missing parent data for category " + doc + leafContext.docBase, reader.toString()); Review comment: Hmm maybe put parens around `(doc + leafContext.docBase)`? I'm not sure about the order-of-concatenation-or-addition here. ########## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java ########## @@ -161,12 +165,14 @@ public DirectoryTaxonomyWriter(Directory directory, OpenMode openMode, TaxonomyW indexEpoch = 1; // no commit exists so we can safely use the new BinaryDocValues field useOlderStoredFieldIndex = false; + useOlderTermPositionForParentOrdinal = false; } else { String epochStr = null; SegmentInfos infos = SegmentInfos.readLatestCommit(dir); /* a previous commit exists, so check the version of the last commit */ useOlderStoredFieldIndex = infos.getIndexCreatedVersionMajor() <= 8; + useOlderTermPositionForParentOrdinal = infos.getIndexCreatedVersionMajor() <= 8; Review comment: So this means, even if you are running Lucene 9.x, if you open an older (created in Lucene 8.x) index, you will continue to use the "old way" (hacky custom positions). I like that approach (versus going segment by segment which gets trickier). One must create a new 9.x index to get the new DV encoding. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org