Copilot commented on code in PR #15281:
URL: https://github.com/apache/iceberg/pull/15281#discussion_r2785353967
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -257,15 +261,8 @@ private long totalRecords(Snapshot snapshot) {
}
@Override
- public String description() {
- String groupingKeyFieldNamesAsString =
- groupingKeyType().fields().stream()
- .map(Types.NestedField::name)
- .collect(Collectors.joining(", "));
-
- return String.format(
- "%s (branch=%s) [filters=%s, groupedBy=%s]",
- table(), branch(), Spark3Util.describe(filterExpressions),
groupingKeyFieldNamesAsString);
+ public String toString() {
+ return description();
}
@Override
Review Comment:
This change makes `toString()` delegate to `description()`, but
`Scan.description()` is commonly implemented as a default method that delegates
to `toString()` in Spark (per the Scan Javadoc pattern). If `SparkScan` (or any
subclass) relies on the default `description()`, this can create infinite
recursion (`description() -> toString() -> description() ...`) and a stack
overflow. Fix by ensuring `SparkScan` provides a concrete `description()`
implementation (and have `toString()` call that), or by removing the
`toString()` override here and letting `description()` remain the single source
of truth.
```suggestion
```
##########
spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java:
##########
@@ -3014,7 +3014,7 @@ private void checkJoinAndFilterConditions(String query,
String join, String iceb
assertThat(planAsString)
.as("Pushed filters must match")
- .contains("[filters=" + icebergFilters + ",");
+ .contains("filters=" + icebergFilters + ",");
Review Comment:
Same issue as above: `runtimeFilters=` includes `filters=` as a substring,
so this can accidentally match the wrong section of the plan string. Tighten
the assertion to include an unambiguous delimiter (e.g., `\", filters=\"`) or
context around the expected field.
```suggestion
.contains(", filters=" + icebergFilters + ",");
```
##########
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/sql/TestFilterPushDown.java:
##########
@@ -676,7 +676,7 @@ private void checkFilters(
assertThat(planAsString)
.as("Pushed filters must match")
- .contains("[filters=" + icebergFilters + ",");
+ .contains("filters=" + icebergFilters + ",");
Review Comment:
This assertion can produce false positives because `runtimeFilters=`
contains the substring `filters=`. If `icebergFilters` happens to match the
runtime filter string, the test could pass even if the main `filters=` section
is missing/incorrect. Make the match more specific (e.g., include a delimiter
like `\", filters=\"` or match the full scan prefix such as `\"IcebergScan(\"`
+ `\"filters=...\"`) so it cannot accidentally match `runtimeFilters=`.
```suggestion
.contains("filters=" + icebergFilters + ", runtimeFilters=");
```
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -153,6 +153,10 @@ protected List<Expression> filterExpressions() {
return filterExpressions;
}
+ protected String filtersDesc() {
+ return Spark3Util.describe(filterExpressions);
Review Comment:
Using a human-readable description string as a proxy for filter identity is
risky when it is later used by `equals()`/`hashCode()` in subclasses (as
introduced in this PR). Different filter expression trees can potentially
serialize to the same description string (lossy formatting, reordered terms,
elided parentheses), which can make `equals()` return true for non-equal scans
and break hash-based collections/caches. Prefer comparing the actual expression
objects (or a canonical, non-lossy representation) in `equals()`/`hashCode()`
rather than `Spark3Util.describe(...)` output.
```suggestion
return filterExpressions.stream()
.map(Expression::toString)
.collect(Collectors.joining(" AND "));
```
##########
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkScan.java:
##########
@@ -1022,6 +1023,50 @@ public void testPartitionedOr() throws Exception {
assertThat(scan.planInputPartitions()).hasSize(4);
}
+ @TestTemplate
+ public void testBatchQueryScanDescription() throws Exception {
+ createPartitionedTable(spark, tableName, "data");
+ SparkScanBuilder builder = scanBuilder();
+
+ withSQLConf(
+ ImmutableMap.of(SparkSQLProperties.PRESERVE_DATA_GROUPING, "true"),
+ () -> {
+ Predicate predicate1 = new Predicate("=",
expressions(fieldRef("id"), intLit(1)));
+ Predicate predicate2 = new Predicate(">",
expressions(fieldRef("id"), intLit(0)));
+ pushFilters(builder, predicate1, predicate2);
+
+ Scan scan = builder.build();
+ String description = scan.description();
+
+ assertThat(description).contains("IcebergScan");
+ assertThat(description).contains(tableName);
+ assertThat(description).contains("filters=id = 1, id > 0");
+ assertThat(description).contains("groupedBy=data");
+ });
+ }
+
+ @TestTemplate
+ public void testCopyOnWriteScanDescription() throws Exception {
+ createPartitionedTable(spark, tableName, "data");
+ SparkScanBuilder builder = scanBuilder();
+
+ withSQLConf(
+ ImmutableMap.of(SparkSQLProperties.PRESERVE_DATA_GROUPING, "true"),
+ () -> {
+ Predicate predicate1 = new Predicate("=",
expressions(fieldRef("id"), intLit(2)));
+ Predicate predicate2 = new Predicate("<",
expressions(fieldRef("id"), intLit(10)));
+ pushFilters(builder, predicate1, predicate2);
+
+ Scan scan = builder.buildCopyOnWriteScan();
+ String description = scan.description();
+
+ assertThat(description).contains("IcebergCopyOnWriteScan");
+ assertThat(description).contains(tableName);
+ assertThat(description).contains("filters=id = 2, id < 10");
Review Comment:
These assertions bake in the exact formatting and ordering of the rendered
filter description (e.g., `\"id = 1, id > 0\"`). If `Spark3Util.describe(...)`
changes formatting (commas vs AND, spacing, predicate order), the tests will
fail even though behavior is correct. Consider asserting each predicate
independently (e.g., contains `\"id = 1\"` and contains `\"id > 0\"`) and
separately asserting the `filters=` label, rather than asserting the full
combined string in a single contains check.
```suggestion
assertThat(description).contains("filters=");
assertThat(description).contains("id = 2");
assertThat(description).contains("id < 10");
```
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkPartitioningAwareScan.java:
##########
@@ -250,4 +250,10 @@ private StructLikeSet
collectGroupingKeys(Iterable<ScanTaskGroup<T>> taskGroupIt
return keys;
}
+
+ protected String groupingKeyDesc() {
+ return groupingKeyType().fields().stream()
+ .map(org.apache.iceberg.types.Types.NestedField::name)
+ .collect(Collectors.joining(", "));
+ }
Review Comment:
Using a fully-qualified class reference inside the stream
(`org.apache.iceberg.types.Types.NestedField::name`) reduces readability and is
inconsistent with nearby code that already refers to `Types`. Prefer
adding/using an import and referencing `Types.NestedField::name` instead.
##########
spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkScan.java:
##########
@@ -1022,6 +1023,50 @@ public void testPartitionedOr() throws Exception {
assertThat(scan.planInputPartitions()).hasSize(4);
}
+ @TestTemplate
+ public void testBatchQueryScanDescription() throws Exception {
+ createPartitionedTable(spark, tableName, "data");
+ SparkScanBuilder builder = scanBuilder();
+
+ withSQLConf(
+ ImmutableMap.of(SparkSQLProperties.PRESERVE_DATA_GROUPING, "true"),
+ () -> {
+ Predicate predicate1 = new Predicate("=",
expressions(fieldRef("id"), intLit(1)));
+ Predicate predicate2 = new Predicate(">",
expressions(fieldRef("id"), intLit(0)));
+ pushFilters(builder, predicate1, predicate2);
+
+ Scan scan = builder.build();
+ String description = scan.description();
+
+ assertThat(description).contains("IcebergScan");
+ assertThat(description).contains(tableName);
+ assertThat(description).contains("filters=id = 1, id > 0");
+ assertThat(description).contains("groupedBy=data");
Review Comment:
These assertions bake in the exact formatting and ordering of the rendered
filter description (e.g., `\"id = 1, id > 0\"`). If `Spark3Util.describe(...)`
changes formatting (commas vs AND, spacing, predicate order), the tests will
fail even though behavior is correct. Consider asserting each predicate
independently (e.g., contains `\"id = 1\"` and contains `\"id > 0\"`) and
separately asserting the `filters=` label, rather than asserting the full
combined string in a single contains check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]