openinx commented on a change in pull request #2064:
URL: https://github.com/apache/iceberg/pull/2064#discussion_r557099820
##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -138,6 +138,9 @@ private TableProperties() {
public static final String ENGINE_HIVE_ENABLED = "engine.hive.enabled";
public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;
+ public static final String WRITE_SHUFFLE_BY_PARTITION =
"write.shuffle-by.partition";
Review comment:
Let me collect all the question here:
> Align the table properties to other engines ?
Yes, I agreed. @aokolnychyi , I think the three write modes are related
to __SortOrder__ specification, different mode decides the real write
behavior. (btw, what's the semantic of __local sort__, global sort is quite
easy to understand, does the local sort means we will buffer those records in
a in-memory sorted map, and then flush them into a file once reached
memory-threshold ? Then all the records written by the same task are always
sorted locally ? ) .
Back to this PR, we don't define the sort order write behavior for flink.
The purpose is reducing small files by shuffling so that each sub-task don't
have to write so many files. It's a specific option for streaming job (As
@stevenzwu said, we don't do real sort in flink streaming because it's
expensive to accomplish sort when processing record one by one incrementally ).
I'm thinking it's not good to define this as a global table properties
because it's actually a job-level configuration key.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]