dramaticlly commented on code in PR #13785:
URL: https://github.com/apache/iceberg/pull/13785#discussion_r2279710048
##########
core/src/main/java/org/apache/iceberg/MetricsConfig.java:
##########
@@ -146,7 +146,7 @@ public Set<Integer> struct(Types.StructType struct,
Iterable<Set<Integer>> field
// visit children to add more ids
Iterator<Set<Integer>> iter = fieldResults.iterator();
- while (shouldContinue() && iter.hasNext()) {
+ while (iter.hasNext()) {
iter.next();
Review Comment:
Yeah this one took me a while to figure out. Let me share my findings
Visitor logic for struct is setup in `TypeUtil.visit`, for struct it will
prepare a VisitedFieldFuture list for all direct child fields, where
`VisitFieldFuture::get` will trigger the visitor of children of this given
struct. Both struct and list is passed to `CustomOrderSchemaVisitor` so one can
define traversal ordering.
```java
case STRUCT:
Types.StructType struct = type.asNestedType().asStructType();
List<VisitFieldFuture<T>> results =
Lists.newArrayListWithExpectedSize(struct.fields().size());
for (Types.NestedField field : struct.fields()) {
results.add(new VisitFieldFuture<>(field, visitor));
}
return visitor.struct(struct, Iterables.transform(results,
VisitFieldFuture::get));
```
In our `MetricsConfig.limitFieldIds` class we defined our own preorder-alike
to visit top level fields (also add to idSet) before visiting the childen. The
noop-like operation `iter.next();` is where the children get visited.
```java
public Set<Integer> struct(Types.StructType struct, Iterable<Set<Integer>>
fieldResults) {
Iterator<Types.NestedField> fields = struct.fields().iterator();
while (shouldContinue() && fields.hasNext()) {
Types.NestedField field = fields.next();
if (metricsEligible(field.type())) {
idSet.add(field.fieldId());
}
}
Iterator<Set<Integer>> iter = fieldResults.iterator();
while (shouldContinue() && iter.hasNext()) {
// visit children lazily to add more ids
iter.next();
}
return null;
}
```
I also reverted to add shouldContinue as this help with early termination
when we already reached limit and no need to visit children
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]