rdblue commented on a change in pull request #2064:
URL: https://github.com/apache/iceberg/pull/2064#discussion_r562160730
##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -138,6 +138,9 @@ private TableProperties() {
public static final String ENGINE_HIVE_ENABLED = "engine.hive.enabled";
public static final boolean ENGINE_HIVE_ENABLED_DEFAULT = false;
+ public static final String WRITE_SHUFFLE_BY_PARTITION =
"write.shuffle-by.partition";
Review comment:
@electrum, as far as what a "local sort" means, I think option 2 sounds
good to me for a task-level sort. If that sort is needlessly expensive, then it
is okay for Trino to skip it. But I think that if a table has a defined sort
order, the right thing would be for Trino to apply it.
For data distribution, it sounds like Trino will only support `none` and
`hash` modes in the short term. That's reasonable given that you can't stage
data and use it twice. Even with shuffle data reuse, global sort in Spark is
quite expensive in some cases (doing a large join twice, for example).
Eventually, we want to get to where the table metadata has a sketch of the data
distribution so you can use that to get ranges for a global ordering.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]