[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7637: Spark 3.4: Distribution and ordering enhancements

via GitHub Fri, 19 May 2023 14:55:55 -0700


aokolnychyi commented on code in PR #7637:
URL: https://github.com/apache/iceberg/pull/7637#discussion_r1199411557



##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkDistributionAndOrderingUtil.java:
##########
@@ -326,16 +445,41 @@ public void testRangeWritePartitionedSortedTable() {
     checkWriteDistributionAndOrdering(table, expectedDistribution, 
expectedOrdering);
   }
 
+  @Test
+  public void testRangeWritePartitionedSortedTableFanout() {
+    sql(
+        "CREATE TABLE %s (id BIGINT, data STRING, date DATE, ts TIMESTAMP) "
+            + "USING iceberg "
+            + "PARTITIONED BY (date)",
+        tableName);
+
+    Table table = validationCatalog.loadTable(tableIdent);
+
+    table.replaceSortOrder().asc("id").commit();
+
+    table.updateProperties().set(SPARK_WRITE_PARTITIONED_FANOUT_ENABLED, 
"true").commit();
+
+    SortOrder[] expectedOrdering =
+        new SortOrder[] {
+          Expressions.sort(Expressions.column("date"), 
SortDirection.ASCENDING),
+          Expressions.sort(Expressions.column("id"), SortDirection.ASCENDING)
+        };
+
+    Distribution expectedDistribution = 
Distributions.ordered(expectedOrdering);
+
+    checkWriteDistributionAndOrdering(table, expectedDistribution, 
expectedOrdering);
+  }
+
   // =============================================================
   // Distribution and ordering for copy-on-write DELETE operations
   // =============================================================
   //
   // UNPARTITIONED UNORDERED
   // -------------------------------------------------------------------------
-  // delete mode is NOT SET -> CLUSTER BY _file + LOCALLY ORDER BY _file, _pos
+  // delete mode is NOT SET -> CLUSTER BY _file + empty ordering

Review Comment:
   I disabled the local sort by `_file` and `_pos` in DELETE operations as it 
is not helping that much. If we perform a DELETE operation and shuffle the 
records all over the place, we will cluster them by `_file` before writing. In 
most cases, records from multiple files will end up in a single task. If we 
stitch together two sorted chunks into one output file, the order of that file 
will be broken. So what’s the point of doing the sort and potentially spilling 
to disk? There is a very narrow use case where the old behavior could make 
sense: a task gets only records from a single file and that file was properly 
sorted yet the sort order is not defined in the table. I don’t think it is a 
good idea to optimize for that use case. Keep in mind it only happens if the 
sort order is empty. In most cases, it really means there is no reasonable sort 
order to preserve.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7637: Spark 3.4: Distribution and ordering enhancements

Reply via email to