github-actions[bot] commented on code in PR #62821:
URL: https://github.com/apache/doris/pull/62821#discussion_r3345327921
##########
be/src/exec/scan/file_scanner.cpp:
##########
@@ -1689,6 +1664,10 @@ Status FileScanner::_generate_partition_columns() {
if (!range.__isset.columns_from_path_keys) {
return Status::OK();
}
+ DORIS_CHECK(range.__isset.columns_from_path);
+ DORIS_CHECK(range.__isset.columns_from_path_is_null);
Review Comment:
`columns_from_path_is_null` is a newly-added optional Thrift field, but this
makes it mandatory whenever `columns_from_path_keys` is present. During a
rolling upgrade, an old FE still sends
`columns_from_path_keys`/`columns_from_path` without this new field for
Hive/Hudi/load scan ranges, and a new BE will hit this `DORIS_CHECK` and
crash/fail the scanner before reading any data. The old code explicitly
tolerated the field being absent, so please keep a compatibility branch here
(and in the other BE extraction paths that added the same check) that treats
missing null flags as all non-null or otherwise handles old FE scan ranges
safely.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergUtils.java:
##########
@@ -704,6 +705,57 @@ public static Map<String, String>
getPartitionInfoMap(PartitionData partitionDat
return partitionInfoMap;
}
+ public static List<String> getIdentityPartitionColumns(Table table) {
+ LinkedHashSet<String> partitionColumns = new LinkedHashSet<>();
+ for (PartitionSpec spec : table.specs().values()) {
+ for (PartitionField partitionField : spec.fields()) {
Review Comment:
This returns the union of identity partition columns across every Iceberg
partition spec, but `IcebergScanNode` uses that list to classify slots for the
whole scan while each split only sets `columns_from_path` for its own spec. If
a table evolves from spec `identity(dt)` to `identity(dt), identity(region)`,
files written under the old spec do not have a `region` partition value;
nevertheless `region` is classified as `PARTITION_KEY`, so BE will not read
`region` from the data file for those old files and will return a
missing/default value instead of the stored column. Please make the
classification safe for per-spec evolution, for example only treating a column
as metadata-filled when every planned split/spec can provide that identity
value, or keep such columns as regular file columns unless the split has the
metadata value. Add a test with multiple Iceberg specs where an identity
partition column is added after existing files.
##########
regression-test/suites/external_table_p0/paimon/test_paimon_partition_table.groovy:
##########
@@ -52,6 +52,74 @@ suite("test_paimon_partition_table", "p0,external") {
String baseQueryName = "qt_show_partition_${tableName}"
"$baseQueryName" """show partitions from ${tableName};"""
}
+
+ sql """set force_jni_scanner=true"""
+
+ def salesByDateRows = sql """select sale_date from sales_by_date order
by sale_date"""
+ assertEquals(
Review Comment:
For deterministic query results, the regression-test standard in `AGENTS.md`
asks to use `qt_*`/`order_qt_*` and generated `.out` files rather than Groovy
`assertEquals`. These new fixed-result checks will not be reflected in the
expected-output file and are inconsistent with the rest of this suite's
`qt_show_partition_*` pattern. Please convert these result checks to ordered
`qt` cases (or add a clear reason why assertion-based validation is necessary).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]