openinx commented on a change in pull request #2064:
URL: https://github.com/apache/iceberg/pull/2064#discussion_r557099820



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -138,6 +138,9 @@ private TableProperties() {
   public static final String ENGINE_HIVE_ENABLED = "engine.hive.enabled";
   public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;
 
+  public static final String WRITE_SHUFFLE_BY_PARTITION = 
"write.shuffle-by.partition";

Review comment:
       Let me collect all the question here: 
   
   > Align the table properties to other engines ?
   
   Yes,  I agreed.   @aokolnychyi ,  I think the three write modes are related 
to __SortOrder__  specification,  different mode decides the real write 
behavior.  (btw, what's the semantic of __local sort__,  global sort is quite 
easy to understand,  does the local sort means we will buffer those records in 
a in-memory sorted map,  and then  flush them into a file once reached 
memory-threshold ?  Then all the records written by the same task are always 
sorted locally ? ) .  
   
   Back to this PR,  we don't define the sort order write behavior for flink.   
The purpose is reducing small files by shuffling so that each sub-task don't 
have to write so many files.   It's a specific option for streaming job (As 
@stevenzwu  said,  we don't do real sort in flink streaming because it's 
expensive to accomplish sort when processing record one by one incrementally ). 
   I'm thinking it's not good to define this as a global table properties 
because it's actually a job-level configuration key.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to