Guosmilesmile commented on code in PR #15776:
URL: https://github.com/apache/iceberg/pull/15776#discussion_r3005029517


##########
orc/src/main/java/org/apache/iceberg/orc/ORC.java:
##########
@@ -787,11 +788,24 @@ ReadBuilder constantFieldIds(Set<Integer> 
newConstantFieldIds) {
 
     public <D> CloseableIterable<D> build() {
       Preconditions.checkNotNull(schema, "Schema is required");
+      Set<Integer> topLevelIdsToExclude =
+          Sets.difference(
+              Sets.union(constantFieldIds, MetadataColumns.metadataFieldIds()),
+              ImmutableSet.of(
+                  MetadataColumns.ROW_ID.fieldId(),
+                  MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER.fieldId()));

Review Comment:
   In the original logic, the construction of `idsToExclude` does not require 
metadata and is based on deleting by `fieldIds`. However, `TypeUtil.selectNot` 
will extract `3: id_bucket` from `_partition`, so the deletion is not thorough 
and will result in extra redundant information. This will cause the ORC reader 
to get misaligned when using the `index++` approach in the `row_id` scenario.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to