Guosmilesmile commented on code in PR #15776:
URL: https://github.com/apache/iceberg/pull/15776#discussion_r3005029517
##########
orc/src/main/java/org/apache/iceberg/orc/ORC.java:
##########
@@ -787,11 +788,24 @@ ReadBuilder constantFieldIds(Set<Integer>
newConstantFieldIds) {
public <D> CloseableIterable<D> build() {
Preconditions.checkNotNull(schema, "Schema is required");
+ Set<Integer> topLevelIdsToExclude =
+ Sets.difference(
+ Sets.union(constantFieldIds, MetadataColumns.metadataFieldIds()),
+ ImmutableSet.of(
+ MetadataColumns.ROW_ID.fieldId(),
+ MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER.fieldId()));
Review Comment:
In the original logic, the construction of `idsToExclude` does not require
metadata and is based on deleting by `fieldIds`. However, `TypeUtil.selectNot`
will extract `3: id_bucket` from `_partition`, so the deletion is not thorough
and will result in extra redundant information. This will cause the ORC reader
to get misaligned when using the `index++` approach in the `row_id` scenario.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]