rdblue commented on a change in pull request #2829:
URL: https://github.com/apache/iceberg/pull/2829#discussion_r706495504



##########
File path: 
spark/src/main/java/org/apache/iceberg/spark/actions/BaseRewriteDataFilesSparkAction.java
##########
@@ -149,7 +168,13 @@ public RewriteDataFiles filter(Expression expression) {
 
     try {
       Map<StructLike, List<FileScanTask>> filesByPartition = 
Streams.stream(fileScanTasks)
-          .collect(Collectors.groupingBy(task -> task.file().partition()));
+          .collect(Collectors.groupingBy(task -> {
+            if (task.file().specId() == table.spec().specId()) {
+              return task.file().partition();
+            } else {
+              return EmptyStruct.get();

Review comment:
       `StructLike` makes no guarantees about `equals`/`hashCode` behavior, so 
using it as a map key is like using a `CharSequence` as a map key. Probably not 
a good idea because it will break if the underlying implementation changes or 
differs. I'd recommend using `StructLikeMap` that handles consistent behavior. 
But that requires using a specific struct type. Since you really only need the 
table spec's struct type, you can use that. Then keep any tasks that are not in 
the current table spec in a list, like this:
   
   ```java
   StructLikeMap<FileScanTask> filesByPartition = 
StructLikeMap.create(table.spec().partitionType());
   List<FileScanTask> tasksFromOtherSpecs = Lists.newArrayList();
   Streams.stream(fileScanTasks).forEach(task -> {
       if (task.file().specId() != table.spec().specId()) {
         tasksFromOtherSpecs.add(task);
       } else {
         filesByPartition.put(task.file().partition(), task);
       }
     });
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to