chenjunjiedada commented on a change in pull request #786: replace SparkDataFile with DataFile URL: https://github.com/apache/incubator-iceberg/pull/786#discussion_r379961821
########## File path: spark/src/main/scala/org/apache/iceberg/spark/SparkTableUtil.scala ########## @@ -527,9 +425,8 @@ object SparkTableUtil { val metricsConfig = MetricsConfig.fromProperties(targetTable.properties) val manifests = partitionDS - .flatMap(partition => listPartition(partition, serializableConf, metricsConfig)) + .flatMap(partition => listPartition(partition, spec, serializableConf, metricsConfig)) .repartition(numShufflePartitions) - .orderBy($"path") Review comment: BTW, even we coalesce files in a manifest for same partition, I think we still have to iterate through all manifest entries in the manifest for partition skipping. Please correct me if I am wrong. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org