RussellSpitzer commented on a change in pull request #2829:
URL: https://github.com/apache/iceberg/pull/2829#discussion_r705523413
##########
File path:
spark/src/main/java/org/apache/iceberg/spark/actions/BaseRewriteDataFilesSparkAction.java
##########
@@ -149,7 +168,13 @@ public RewriteDataFiles filter(Expression expression) {
try {
Map<StructLike, List<FileScanTask>> filesByPartition =
Streams.stream(fileScanTasks)
- .collect(Collectors.groupingBy(task -> task.file().partition()));
+ .collect(Collectors.groupingBy(task -> {
+ if (task.file().specId() == table.spec().specId()) {
+ return task.file().partition();
+ } else {
+ return EmptyStruct.instance();
Review comment:
Tasks which are comprised of data which is not partitioned according to
the curtain spec must treated as if they were not partitioned. We can probably
ease this restriction for partitioning that satisfies the current partitioning
(Ie Table is set to partition on day but this is an hour partition) but this is
the simplest approach for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]