[GitHub] [lucene] mikemccand commented on a change in pull request #442: LUCENE-10122 Use NumericDocValue to store taxonomy parent array

GitBox Tue, 16 Nov 2021 14:13:48 -0800


mikemccand commented on a change in pull request #442:
URL: https://github.com/apache/lucene/pull/442#discussion_r750653939




##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##########
@@ -130,15 +125,49 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
       return;
     }
 
+    if (tryLoadParentUsingTermPosition(reader, first)) {
+      // The parent array is already loaded
+      return;
+    }
+
+    for (LeafReaderContext leafContext : reader.leaves()) {
+      int leafDocNum = leafContext.reader().maxDoc();
+      if (leafContext.docBase + leafDocNum <= first) {

Review comment:
       We can do this because the taxo index ensures that new ordinals are 
appended into new segments at the end of the index, on each refresh?
   
   E.g. it's using `LogDocMergePolicy` or `LogByteSizeMergePolicy`?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##########
@@ -130,15 +125,49 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
       return;
     }
 
+    if (tryLoadParentUsingTermPosition(reader, first)) {
+      // The parent array is already loaded
+      return;
+    }
+
+    for (LeafReaderContext leafContext : reader.leaves()) {
+      int leafDocNum = leafContext.reader().maxDoc();
+      if (leafContext.docBase + leafDocNum <= first) {
+        // skip this leaf if it does not contain new categories
+        continue;
+      }
+      NumericDocValues parentValues =
+          
leafContext.reader().getNumericDocValues(Consts.FIELD_PARENT_ORDINAL_NDV);

Review comment:
       Maybe also `null`-check `parentValues` and throw `CorruptIndexException` 
if so?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##########
@@ -130,15 +125,49 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
       return;
     }
 
+    if (tryLoadParentUsingTermPosition(reader, first)) {

Review comment:
       Could we maybe make this more explicit?  Check the created version from 
`SegmentInfos`?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##########
@@ -130,15 +125,49 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
       return;
     }
 
+    if (tryLoadParentUsingTermPosition(reader, first)) {
+      // The parent array is already loaded
+      return;
+    }
+
+    for (LeafReaderContext leafContext : reader.leaves()) {
+      int leafDocNum = leafContext.reader().maxDoc();
+      if (leafContext.docBase + leafDocNum <= first) {
+        // skip this leaf if it does not contain new categories
+        continue;
+      }
+      NumericDocValues parentValues =
+          
leafContext.reader().getNumericDocValues(Consts.FIELD_PARENT_ORDINAL_NDV);
+      for (int doc = Math.max(first - leafContext.docBase, 0); doc < 
leafDocNum; doc++) {
+        if (parentValues.advanceExact(doc) == false) {
+          throw new CorruptIndexException(
+              "Missing parent data for category " + doc + leafContext.docBase, 
reader.toString());

Review comment:
       Hmm maybe put parens around `(doc + leafContext.docBase)`?  I'm not sure 
about the order-of-concatenation-or-addition here.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java
##########
@@ -161,12 +165,14 @@ public DirectoryTaxonomyWriter(Directory directory, 
OpenMode openMode, TaxonomyW
       indexEpoch = 1;
       // no commit exists so we can safely use the new BinaryDocValues field
       useOlderStoredFieldIndex = false;
+      useOlderTermPositionForParentOrdinal = false;
     } else {
       String epochStr = null;
 
       SegmentInfos infos = SegmentInfos.readLatestCommit(dir);
       /* a previous commit exists, so check the version of the last commit */
       useOlderStoredFieldIndex = infos.getIndexCreatedVersionMajor() <= 8;
+      useOlderTermPositionForParentOrdinal = 
infos.getIndexCreatedVersionMajor() <= 8;

Review comment:
       So this means, even if you are running Lucene 9.x, if you open an older 
(created in Lucene 8.x) index, you will continue to use the "old way" (hacky 
custom positions).  I like that approach (versus going segment by segment which 
gets trickier).
   
   One must create a new 9.x index to get the new DV encoding.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand commented on a change in pull request #442: LUCENE-10122 Use NumericDocValue to store taxonomy parent array

Reply via email to