[GitHub] [iceberg] rdblue commented on a diff in pull request #8621: Spark 3.5: Use fanout writers for unsorted tables by default

via GitHub Tue, 26 Sep 2023 12:11:06 -0700


rdblue commented on code in PR #8621:
URL: https://github.com/apache/iceberg/pull/8621#discussion_r1337665477



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java:
##########
@@ -174,12 +174,17 @@ public long targetDataFileSize() {
         .parse();
   }
 
-  public boolean fanoutWriterEnabled() {
+  public boolean useFanoutWriter(SparkWriteRequirements writeRequirements) {
+    boolean defaultValue = !writeRequirements.hasOrdering();

Review Comment:
   I think the argument against it is that this would blow up memory. A local 
sort is at least memory-safe but opening hundreds of Parquet files at once is 
not and could cause an OutOfMemoryError.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #8621: Spark 3.5: Use fanout writers for unsorted tables by default

Reply via email to