aokolnychyi commented on code in PR #7637:
URL: https://github.com/apache/iceberg/pull/7637#discussion_r1199411557
##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkDistributionAndOrderingUtil.java:
##########
@@ -326,16 +445,41 @@ public void testRangeWritePartitionedSortedTable() {
checkWriteDistributionAndOrdering(table, expectedDistribution,
expectedOrdering);
}
+ @Test
+ public void testRangeWritePartitionedSortedTableFanout() {
+ sql(
+ "CREATE TABLE %s (id BIGINT, data STRING, date DATE, ts TIMESTAMP) "
+ + "USING iceberg "
+ + "PARTITIONED BY (date)",
+ tableName);
+
+ Table table = validationCatalog.loadTable(tableIdent);
+
+ table.replaceSortOrder().asc("id").commit();
+
+ table.updateProperties().set(SPARK_WRITE_PARTITIONED_FANOUT_ENABLED,
"true").commit();
+
+ SortOrder[] expectedOrdering =
+ new SortOrder[] {
+ Expressions.sort(Expressions.column("date"),
SortDirection.ASCENDING),
+ Expressions.sort(Expressions.column("id"), SortDirection.ASCENDING)
+ };
+
+ Distribution expectedDistribution =
Distributions.ordered(expectedOrdering);
+
+ checkWriteDistributionAndOrdering(table, expectedDistribution,
expectedOrdering);
+ }
+
// =============================================================
// Distribution and ordering for copy-on-write DELETE operations
// =============================================================
//
// UNPARTITIONED UNORDERED
// -------------------------------------------------------------------------
- // delete mode is NOT SET -> CLUSTER BY _file + LOCALLY ORDER BY _file, _pos
+ // delete mode is NOT SET -> CLUSTER BY _file + empty ordering
Review Comment:
I disabled the local sort by `_file` and `_pos` in DELETE operations as it
is not helping that much. If we perform a DELETE operation and shuffle the
records all over the place, we will cluster them by `_file` before writing. In
most cases, records from multiple files will end up in a single task. If we
stitch together two sorted chunks into one output file, the order of that file
will be broken. So what’s the point of doing the sort and potentially spilling
to disk? There is a very narrow use case where the old behavior could make
sense: a task gets only records from a single file and that file was properly
sorted yet the sort order is not defined in the table. I don’t think it is a
good idea to optimize for that use case. Keep in mind it only happens if the
sort order is empty. In most cases, it really means there is no reasonable sort
order to preserve.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]