[GitHub] [iceberg] openinx commented on a change in pull request #2064: Flink: Add option to shuffle by partition key in iceberg sink.

GitBox Tue, 19 Jan 2021 04:30:54 -0800


openinx commented on a change in pull request #2064:
URL: https://github.com/apache/iceberg/pull/2064#discussion_r560006175




##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -138,6 +138,9 @@ private TableProperties() {
   public static final String ENGINE_HIVE_ENABLED = "engine.hive.enabled";
   public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;
 
+  public static final String WRITE_SHUFFLE_BY_PARTITION = 
"write.shuffle-by.partition";

Review comment:
       @rdblue  I like the table you provided,  I have few questions : For an 
iceberg table which has defined its __SortOder__  columns,   the spark job will 
write the sorted records (based on sort keys) into parquet files,  should the 
flink job also write the sorted records into parquet files ?  Should we keep 
the same semantic of __SortOrder__  among different engines  although it's not 
cheap to accomplish the goal ? (  I raise this question because I saw the 
__Flink__ table  does not require locally sorted or global sorted )  
   
   Or the definition of __SortOrder__  is to define the write behavior while 
don't define the read behavior that means the records read from parquet file 
don't have to be sorted by the sort-keys ? Seems like defining write behavior 
is the behavior you guys try to accomplish from my understanding based the 
above table.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on a change in pull request #2064: Flink: Add option to shuffle by partition key in iceberg sink.

Reply via email to