RussellSpitzer commented on issue #3564: URL: https://github.com/apache/iceberg/issues/3564#issuecomment-982768379
@andrei-ionescu, The DatasourceV2 API allows for the underlying datasource (Iceberg) to say what distribution the incoming writes must be in for a successful write. So an Iceberg Table that is partitioned on some set of columns will always advertise that it requires the data to be sorted using an expression which keeps all the data from a single Iceberg partition gathered together. The Iceberg Source only implements the advertising of this distribution, Spark actually does the sort. See https://github.com/apache/spark/blob/ac7c52db28f35237f78215c38b274a45c1ae7462/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RequiresDistributionAndOrdering.java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
