mayya-sharipova commented on a change in pull request #11: URL: https://github.com/apache/lucene/pull/11#discussion_r598290999
########## File path: lucene/core/src/java/org/apache/lucene/index/IndexingChain.java ########## @@ -1313,4 +1307,118 @@ public void recycleIntBlocks(int[][] blocks, int offset, int length) { bytesUsed.addAndGet(-(length * (IntBlockPool.INT_BLOCK_SIZE * Integer.BYTES))); } } + + /** + * A schema of the field in the current document. With every new document this schema is reset. As + * the document fields are processed, we update the schema with options encountered in this + * document. Once the processing for the document is done, we compare the built schema of the + * current document with the corresponding FieldInfo (FieldInfo is built on a first document in + * the segment where we encounter this field). If there is inconsistency, we raise an error. This + * ensures that a field has the same data structures across all documents. + */ + private static final class FieldSchema { Review comment: @s1monw Yes, our intention of `FieldInfo` to be set up by the 1st doc in the index that contains this field, and after that never to be changed for the whole index. Only `attributes` can be combined between several segments. It would be great in future to make `FieldInfo` fully immutable . But `FieldSchema` has a different purpose. It is reset with every document and is being built as we encounter IndexableFields in a doc. After that we compare the built `FieldSchema` with the expected `FieldInfo` for extra or missing fields. For example, if `FieldInfo` for `field1` is set up to be indexed with docValues and points, but the `docX` has built `FieldSchema` for `field1` that contain only docValues , we raise error and abort the indexing of docX. Please let me know if this makes sense or you see how it can be organized better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org