RussellSpitzer commented on issue #3564:
URL: https://github.com/apache/iceberg/issues/3564#issuecomment-982768379


   @andrei-ionescu, The DatasourceV2 API allows for the underlying datasource 
(Iceberg) to say what distribution the incoming writes must be in for a 
successful write. 
   
   So an Iceberg Table that is partitioned on some set of columns will always 
advertise that it requires the data to be sorted using an expression which 
keeps all the data from a single Iceberg partition gathered together. The 
Iceberg Source only implements the advertising of this distribution, Spark 
actually does the sort.
   
   See 
https://github.com/apache/spark/blob/ac7c52db28f35237f78215c38b274a45c1ae7462/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RequiresDistributionAndOrdering.java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to