[GitHub] [lucene] mayya-sharipova commented on a change in pull request #11: LUCENE-9334 Consistency of field data structures

GitBox Sun, 21 Mar 2021 07:50:52 -0700


mayya-sharipova commented on a change in pull request #11:
URL: https://github.com/apache/lucene/pull/11#discussion_r598290999




##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexingChain.java
##########
@@ -1313,4 +1307,118 @@ public void recycleIntBlocks(int[][] blocks, int 
offset, int length) {
       bytesUsed.addAndGet(-(length * (IntBlockPool.INT_BLOCK_SIZE * 
Integer.BYTES)));
     }
   }
+
+  /**
+   * A schema of the field in the current document. With every new document 
this schema is reset. As
+   * the document fields are processed, we update the schema with options 
encountered in this
+   * document. Once the processing for the document is done, we compare the 
built schema of the
+   * current document with the corresponding FieldInfo (FieldInfo is built on 
a first document in
+   * the segment where we encounter this field). If there is inconsistency, we 
raise an error. This
+   * ensures that a field has the same data structures across all documents.
+   */
+  private static final class FieldSchema {

Review comment:
       @s1monw  Yes, our intention of `FieldInfo` to be set up by the 1st doc 
in the index that contains this field, and after that never to be changed for 
the whole index. Only `attributes` can be combined between several segments.
   It would be great  in future to make `FieldInfo` fully immutable .
   
   But `FieldSchema` has a different purpose. It is reset with every document 
and is being built as we encounter IndexableFields in a doc.  After that we 
compare the built `FieldSchema` with the expected `FieldInfo` for extra or 
missing fields.  For example, if `FieldInfo` for `field1` is set up  to be 
indexed with docValues and points,  but the `docX` has built `FieldSchema`  for 
`field1` that contain only docValues , we raise error and abort the indexing of 
docX. 
   
   Please let me know if this makes sense or you see how it can be organized 
better.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #11: LUCENE-9334 Consistency of field data structures

Reply via email to