RussellSpitzer opened a new issue, #5098:
URL: https://github.com/apache/iceberg/issues/5098

   Guava Sets union creates a view with every api call, this ends up with a 
deeply nested set in RewriteDataFilesCommitManager where we attempt to merge 
all of the data files added and deleted together. If there is a huge number of 
sets then we end up with a set which is a view of a set which is a view of a 
set .... creating a very large stack. This should probably be changed to just 
using a raw "addAll"
   
   ```java
   diff --git 
a/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
 
b/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
   index 9a5cc4c94..9c9b23988 100644
   --- 
a/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
   +++ 
b/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
   @@ -74,8 +74,8 @@ public class RewriteDataFilesCommitManager {
        Set<DataFile> rewrittenDataFiles = Sets.newHashSet();
        Set<DataFile> addedDataFiles = Sets.newHashSet();
        for (RewriteFileGroup group : fileGroups) {
   -      rewrittenDataFiles = Sets.union(rewrittenDataFiles, 
group.rewrittenFiles());
   -      addedDataFiles = Sets.union(addedDataFiles, group.addedFiles());
   +      rewrittenDataFiles.addAll(group.rewrittenFiles();
   +      addedDataFiles.addAll(group.addedFiles());
        }
   
        RewriteFiles rewrite = 
table.newRewrite().validateFromSnapshot(startingSnapshotId);
    ```
    
    
    -- Error Message after creating large Union
    
    ```
    Caused by: java.lang.StackOverflowError
        at 
org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:729)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:710)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:730)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:710)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:730)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to