anoopj commented on code in PR #16441:
URL: https://github.com/apache/iceberg/pull/16441#discussion_r3276416415
##########
core/src/main/java/org/apache/iceberg/StatsUtil.java:
##########
@@ -34,80 +34,67 @@
class StatsUtil {
private static final Logger LOG = LoggerFactory.getLogger(StatsUtil.class);
- // the number of reserved field IDs from the reserved field ID space as
defined in
- // https://iceberg.apache.org/spec/#reserved-field-ids
- static final int NUM_RESERVED_FIELD_IDS = 200;
- // the starting field ID of the reserved field ID space
- static final int RESERVED_FIELD_IDS_START = Integer.MAX_VALUE -
NUM_RESERVED_FIELD_IDS;
- // the number of supported stats per table column
+ private static final int FIRST_SUPPORTED_METADATA_FIELD_ID =
Review Comment:
To avoid future code drift, consider moving `SUPPORTED_METADATA_FIELD_IDS`
ahead, and then `FIRST_SUPPORTED_METADATA_FIELD_ID` would just be
`Collections.min(SUPPORTED_METADATA_FIELD_IDS)`
##########
core/src/main/java/org/apache/iceberg/StatsUtil.java:
##########
@@ -34,80 +34,67 @@
class StatsUtil {
private static final Logger LOG = LoggerFactory.getLogger(StatsUtil.class);
- // the number of reserved field IDs from the reserved field ID space as
defined in
- // https://iceberg.apache.org/spec/#reserved-field-ids
- static final int NUM_RESERVED_FIELD_IDS = 200;
- // the starting field ID of the reserved field ID space
- static final int RESERVED_FIELD_IDS_START = Integer.MAX_VALUE -
NUM_RESERVED_FIELD_IDS;
- // the number of supported stats per table column
+ private static final int FIRST_SUPPORTED_METADATA_FIELD_ID =
+ MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER.fieldId();
+ static final Set<Integer> SUPPORTED_METADATA_FIELD_IDS =
+ Sets.newHashSet(
+ MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER.fieldId(),
MetadataColumns.ROW_ID.fieldId());
static final int NUM_SUPPORTED_STATS_PER_COLUMN = 200;
- // the starting field ID of the stats space for data field IDs
+ static final int STATS_SPACE_FIELD_ID_START_FOR_METADATA_FIELDS = 9_000;
Review Comment:
Should we do a Precondition check in a static initializer block to make sure
`SUPPORTED_METADATA_FIELD_IDS` in future don't exceed 5 (the number of slots
available between 9K and 10K)?
##########
core/src/main/java/org/apache/iceberg/StatsUtil.java:
##########
@@ -34,80 +34,67 @@
class StatsUtil {
private static final Logger LOG = LoggerFactory.getLogger(StatsUtil.class);
- // the number of reserved field IDs from the reserved field ID space as
defined in
- // https://iceberg.apache.org/spec/#reserved-field-ids
- static final int NUM_RESERVED_FIELD_IDS = 200;
- // the starting field ID of the reserved field ID space
- static final int RESERVED_FIELD_IDS_START = Integer.MAX_VALUE -
NUM_RESERVED_FIELD_IDS;
- // the number of supported stats per table column
+ private static final int FIRST_SUPPORTED_METADATA_FIELD_ID =
+ MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER.fieldId();
+ static final Set<Integer> SUPPORTED_METADATA_FIELD_IDS =
+ Sets.newHashSet(
Review Comment:
Use `ImmutableSet.of(..)`?
##########
core/src/test/java/org/apache/iceberg/TestStatsUtil.java:
##########
@@ -87,62 +91,63 @@ public void statsIdsForTableColumns() {
@Test
public void statsIdsOverflowForTableColumns() {
- // pick 100 random IDs that are > MAX_FIELD_ID and <
METADATA_SPACE_FIELD_ID_START as going over
+ // pick 100 random IDs that are > MAX_FIELD_ID and <
RESERVED_FIELD_IDS_START as going over
// the entire ID range takes too long
int invalidFieldId = -1;
for (int i = 0; i < 100; i++) {
int id =
ThreadLocalRandom.current()
- .nextInt(
- StatsUtil.MAX_DATA_FIELD_ID + 1,
- StatsUtil.STATS_SPACE_FIELD_ID_START_FOR_METADATA_FIELDS);
+ .nextInt(StatsUtil.MAX_DATA_FIELD_ID + 1,
RESERVED_FIELD_IDS_START);
assertThat(StatsUtil.statsFieldIdForField(id)).as("at pos %s",
id).isEqualTo(invalidFieldId);
}
assertThat(StatsUtil.fieldIdForStatsField(-1)).isEqualTo(invalidFieldId);
+ assertThat(StatsUtil.fieldIdForStatsField(0)).isEqualTo(invalidFieldId);
+ assertThat(StatsUtil.fieldIdForStatsField(200)).isEqualTo(invalidFieldId);
assertThat(StatsUtil.fieldIdForStatsField(5_000)).isEqualTo(invalidFieldId);
+
assertThat(StatsUtil.fieldIdForStatsField(8_600)).isEqualTo(invalidFieldId);
assertThat(StatsUtil.fieldIdForStatsField(10_001)).isEqualTo(invalidFieldId);
assertThat(StatsUtil.fieldIdForStatsField(10_201)).isEqualTo(invalidFieldId);
assertThat(StatsUtil.fieldIdForStatsField(10_500)).isEqualTo(invalidFieldId);
assertThat(StatsUtil.fieldIdForStatsField(10_900)).isEqualTo(invalidFieldId);
+
+ // stats field IDs at or above the exclusive upper bound are invalid
+
assertThat(StatsUtil.fieldIdForStatsField(StatsUtil.STATS_SPACE_FIELD_ID_END))
+ .isEqualTo(invalidFieldId);
+
assertThat(StatsUtil.fieldIdForStatsField(StatsUtil.STATS_SPACE_FIELD_ID_END +
200))
+ .isEqualTo(invalidFieldId);
+
assertThat(StatsUtil.fieldIdForStatsField(Integer.MAX_VALUE)).isEqualTo(invalidFieldId);
+
+ // field ID just past MAX_DATA_FIELD_ID is invalid
+ assertThat(StatsUtil.statsFieldIdForField(StatsUtil.MAX_DATA_FIELD_ID + 1))
+ .isEqualTo(invalidFieldId);
}
@Test
public void statsIdsForReservedColumns() {
Review Comment:
The name is a bit stale?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]