aokolnychyi commented on code in PR #8621:
URL: https://github.com/apache/iceberg/pull/8621#discussion_r1337601468
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java:
##########
@@ -174,12 +174,17 @@ public long targetDataFileSize() {
.parse();
}
- public boolean fanoutWriterEnabled() {
+ public boolean useFanoutWriter(SparkWriteRequirements writeRequirements) {
+ boolean defaultValue = !writeRequirements.hasOrdering();
Review Comment:
Actually, we may want to keep it this way.
Another use case that may benefit from the current approach is SPJ. There,
users set the distribution mode to none. They would get a super expensive local
sort and spill without explicitly enabling fanout writers. I think that's a
more realistic use case than setting to `none` and generating tons of files per
task. If the user sets it to `none` explicitly, they are probably OK with the
number of produced files, which hints it is not a crazy number. So why do a
local sort for them?
We generally preferred safe options vs more performant and it meant more
configs required to improve the performance. I'd say let's turn that around.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]