RussellSpitzer commented on code in PR #4902:
URL: https://github.com/apache/iceberg/pull/4902#discussion_r906630281
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java:
##########
@@ -139,7 +140,10 @@ private RewriteDataFiles checkAndApplyOptions(InternalRow
args, RewriteDataFiles
private RewriteDataFiles checkAndApplyStrategy(RewriteDataFiles action,
String strategy, SortOrder sortOrder) {
// caller of this function ensures that between strategy and sortOrder, at
least one of them is not null.
- if (strategy == null || strategy.equalsIgnoreCase("sort")) {
+ if (strategy == null || strategy.equalsIgnoreCase("sort") ||
strategy.equalsIgnoreCase("zorder")) {
Review Comment:
What I was thinking was we just keep Binpack and sort order as options then
we add in a special casing like
If sort_order.startsWith(zorder) {
zorder
} else {
Normal sort path
}
And this is what we would do till we have better transform handling for
multi-column transforms.
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/SparkZOrderStrategy.java:
##########
@@ -133,9 +133,23 @@ public SparkZOrderStrategy(Table table, SparkSession
spark, List<String> zOrderC
"Cannot perform ZOrdering, all columns provided were identity
partition columns and cannot be used.");
}
+ validateColumnsExistence(table, spark, zOrderColNames);
+
this.zOrderColNames = zOrderColNames;
}
+ private void validateColumnsExistence(Table table, SparkSession spark,
List<String> colNames) {
+ boolean caseSensitive =
Boolean.parseBoolean(spark.conf().get("spark.sql.caseSensitive"));
+ Schema schema = table.schema();
+ colNames.forEach(col -> {
+ NestedField nestedField = caseSensitive ? schema.findField(col) :
schema.caseInsensitiveFindField(col);
+ if (nestedField == null) {
+ throw new IllegalArgumentException(
+ String.format("Cannot find field '%s' in struct: %s", col,
schema.asStruct()));
Review Comment:
"Cannot find field x which is specified in the ZOrder in the table schema
x,y,z"
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/SparkZOrderStrategy.java:
##########
@@ -133,9 +133,23 @@ public SparkZOrderStrategy(Table table, SparkSession
spark, List<String> zOrderC
"Cannot perform ZOrdering, all columns provided were identity
partition columns and cannot be used.");
}
+ validateColumnsExistence(table, spark, zOrderColNames);
+
this.zOrderColNames = zOrderColNames;
}
+ private void validateColumnsExistence(Table table, SparkSession spark,
List<String> colNames) {
+ boolean caseSensitive =
Boolean.parseBoolean(spark.conf().get("spark.sql.caseSensitive"));
Review Comment:
Rather than this we can use the session resolver class which automatically
cases or not based on the conf
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]