electrum commented on a change in pull request #2064:
URL: https://github.com/apache/iceberg/pull/2064#discussion_r561367698



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -138,6 +138,9 @@ private TableProperties() {
   public static final String ENGINE_HIVE_ENABLED = "engine.hive.enabled";
   public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;
 
+  public static final String WRITE_SHUFFLE_BY_PARTITION = 
"write.shuffle-by.partition";

Review comment:
       @rdblue thanks for the pointer. Here are my thoughts on how this would 
work for Trino (formerly Presto SQL).
   
   Trino does streaming execution between stages -- there is no materialized 
shuffle phase. This means that global sorting would only be possible using a 
fixed range, not based on statistics, so it would be vulnerable to skew. I'd 
like to understand the use case for global "sort" compared to "partition".
   
   For local sorting, I see two choices:
   
   1. Write arbitrarily large files. Use a fixed size in-memory buffer, sort 
when full, write to temporary file, then merge files at end. There may be 
multiple merge passes in order to limit the number of files read at once during 
the merge. This is what we do for Hive bucketed-sorted tables, since sorting 
per bucket is required.
   2. Write multiple size-limited files. Use a fixed size in-memory buffer, 
sort when full, write final output file. Repeat until all input data for writer 
has been consumed.
   
   I would prefer the second option as it is simpler and uses fewer resources. 
It satisfies the property that each file is sorted and helps with compression 
and within-file filtering. The downside is that there are more files, but if 
they are of sufficient size, it shouldn't affect reads as we split files anyway 
when reading.
   
   Another option is to sort data using a fixed size buffer before writing each 
batch of rows. This would help with compression and within-file filtering, but 
wouldn't provide a guarantee on sorting for readers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to