RussellSpitzer opened a new issue, #5098:
URL: https://github.com/apache/iceberg/issues/5098
Guava Sets union creates a view with every api call, this ends up with a
deeply nested set in RewriteDataFilesCommitManager where we attempt to merge
all of the data files added and deleted together. If there is a huge number of
sets then we end up with a set which is a view of a set which is a view of a
set .... creating a very large stack. This should probably be changed to just
using a raw "addAll"
```java
diff --git
a/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
b/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
index 9a5cc4c94..9c9b23988 100644
---
a/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
+++
b/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
@@ -74,8 +74,8 @@ public class RewriteDataFilesCommitManager {
Set<DataFile> rewrittenDataFiles = Sets.newHashSet();
Set<DataFile> addedDataFiles = Sets.newHashSet();
for (RewriteFileGroup group : fileGroups) {
- rewrittenDataFiles = Sets.union(rewrittenDataFiles,
group.rewrittenFiles());
- addedDataFiles = Sets.union(addedDataFiles, group.addedFiles());
+ rewrittenDataFiles.addAll(group.rewrittenFiles();
+ addedDataFiles.addAll(group.addedFiles());
}
RewriteFiles rewrite =
table.newRewrite().validateFromSnapshot(startingSnapshotId);
```
-- Error Message after creating large Union
```
Caused by: java.lang.StackOverflowError
at
org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:729)
at
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
at
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:710)
at
org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:730)
at
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
at
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:710)
at
org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:730)
at
org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]