rdblue commented on code in PR #8621:
URL: https://github.com/apache/iceberg/pull/8621#discussion_r1335245581


##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -126,7 +126,7 @@ abstract class SparkWrite implements Write, 
RequiresDistributionAndOrdering {
     this.writeSchema = writeSchema;
     this.dsSchema = dsSchema;
     this.extraSnapshotMetadata = writeConf.extraSnapshotMetadata();
-    this.partitionedFanoutEnabled = writeConf.fanoutWriterEnabled();
+    this.useFanoutWriters = writeConf.fanoutWriterEnabled() && 
!writeRequirements.hasOrdering();

Review Comment:
   I'm trying to think through the implication of this change.
   
   This assumes that if an ordering is present in `writeRequirements` then that 
ordering clusters data by partition. When clustered by partition, there's no 
need to use a fanout writer. I think it's always the case that an ordering will 
cluster by partition when generated by Iceberg, but this is not a guarantee of 
`SparkWriteRequirements` so it's a little risky.
   
   Another implication is that `fanoutWriterEnabled` can _disable_ fanout 
writes but not _enable_ fanout writes as an override, which is a behavior 
change. If there is a case where incoming data isn't clustered for some reason 
-- whether that's a bug in Iceberg or a problem in Spark -- then there is no 
way to override the behavior and use fanout to fix the write. I think that's a 
problem because it would block writes to a table.
   
   I think that this PR should instead make 
`SPARK_WRITE_PARTITIONED_FANOUT_ENABLED` optional. When present the explicit 
value should be used. When not present, then the logic should default to 
`!writeRequirements.hasOrdering`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to