[GitHub] [iceberg] rdblue commented on a change in pull request #3461: Spark: Request distribution and ordering for writes

GitBox Sun, 07 Nov 2021 11:47:58 -0800


rdblue commented on a change in pull request #3461:
URL: https://github.com/apache/iceberg/pull/3461#discussion_r744306962




##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWriteBuilder.java
##########
@@ -112,42 +136,74 @@ public BatchWrite buildForBatch() {
     // Get application id
     String appId = spark.sparkContext().applicationId();
 
-    SparkWrite write = new SparkWrite(spark, table, writeConf, writeInfo, 
appId, writeSchema, dsSchema);
-    if (overwriteByFilter) {
-      return write.asOverwriteByFilter(overwriteExpr);
-    } else if (overwriteDynamic) {
-      return write.asDynamicOverwrite();
-    } else if (overwriteFiles) {
-      return write.asCopyOnWriteMergeWrite(mergeScan, isolationLevel);
+    Distribution distribution;
+    SortOrder[] ordering;
+
+    if (requestDistributionAndOrdering) {
+      distribution = buildRequiredDistribution();
+      ordering = buildRequiredOrdering(distribution);
     } else {
-      return write.asBatchAppend();
+      LOG.warn("Can't request distribution/ordering as extensions are disabled 
and spec has non-identity transforms");

Review comment:
       I think we generally try to hide from end users that partitioning by a 
column actually uses an identity transform. Can we update this to state that 
the table partitioning includes transforms? Like "Skipping distribution and 
ordering request: extensions are disabled and partitioning uses unsupported 
transforms"?
   
   I also like to use "Skipping" rather than "Can't" because this is going 
ahead with the write. If we "can't" do something we normally fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3461: Spark: Request distribution and ordering for writes

Reply via email to