[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #4902: Spark-3.2: Support Zorder option for rewrite_data_files stored procedure

GitBox Fri, 24 Jun 2022 21:11:29 -0700


RussellSpitzer commented on code in PR #4902:
URL: https://github.com/apache/iceberg/pull/4902#discussion_r906630281



##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java:
##########
@@ -139,7 +140,10 @@ private RewriteDataFiles checkAndApplyOptions(InternalRow 
args, RewriteDataFiles
 
   private RewriteDataFiles checkAndApplyStrategy(RewriteDataFiles action, 
String strategy, SortOrder sortOrder) {
     // caller of this function ensures that between strategy and sortOrder, at 
least one of them is not null.
-    if (strategy == null || strategy.equalsIgnoreCase("sort")) {
+    if (strategy == null || strategy.equalsIgnoreCase("sort") || 
strategy.equalsIgnoreCase("zorder")) {

Review Comment:
   What I was thinking was we just keep Binpack and sort order as options then 
we add in a special casing like
   
   If sort_order.startsWith(zorder) {
      zorder
   } else {
      Normal sort path
   }
   
   And this is what we would do till we have better transform handling for 
multi-column transforms.



##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/SparkZOrderStrategy.java:
##########
@@ -133,9 +133,23 @@ public SparkZOrderStrategy(Table table, SparkSession 
spark, List<String> zOrderC
           "Cannot perform ZOrdering, all columns provided were identity 
partition columns and cannot be used.");
     }
 
+    validateColumnsExistence(table, spark, zOrderColNames);
+
     this.zOrderColNames = zOrderColNames;
   }
 
+  private void validateColumnsExistence(Table table, SparkSession spark, 
List<String> colNames) {
+    boolean caseSensitive = 
Boolean.parseBoolean(spark.conf().get("spark.sql.caseSensitive"));
+    Schema schema = table.schema();
+    colNames.forEach(col -> {
+      NestedField nestedField = caseSensitive ? schema.findField(col) : 
schema.caseInsensitiveFindField(col);
+      if (nestedField == null) {
+        throw new IllegalArgumentException(
+            String.format("Cannot find field '%s' in struct: %s", col, 
schema.asStruct()));

Review Comment:
   "Cannot find field x which is specified in the ZOrder in the table schema 
x,y,z"



##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/SparkZOrderStrategy.java:
##########
@@ -133,9 +133,23 @@ public SparkZOrderStrategy(Table table, SparkSession 
spark, List<String> zOrderC
           "Cannot perform ZOrdering, all columns provided were identity 
partition columns and cannot be used.");
     }
 
+    validateColumnsExistence(table, spark, zOrderColNames);
+
     this.zOrderColNames = zOrderColNames;
   }
 
+  private void validateColumnsExistence(Table table, SparkSession spark, 
List<String> colNames) {
+    boolean caseSensitive = 
Boolean.parseBoolean(spark.conf().get("spark.sql.caseSensitive"));

Review Comment:
   Rather than this we can use the session resolver class which automatically 
cases or not based on the conf



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #4902: Spark-3.2: Support Zorder option for rewrite_data_files stored procedure

Reply via email to