rdblue opened a new pull request #1165: URL: https://github.com/apache/iceberg/pull/1165
`MergingSnapshotProducer` was recently refactored to separate out manifest filtering and merging so it could be reused for delete files. That refactor also updated the filter to use a `CharSequenceSet` instead of a `HashSet` and `CharSequenceWrapper`. The `CharSequenceSet` embeds the wrapper and is easier to use, but this introduced a bug where multiple threads using the same `CharSequenceSet` would use the same seq wrapper in `contains`. This was causing deletes of specific files to miss data files if the wrapper was reused while testing a file that should be deleted was in the delete set. The solution is to make `CharSequenceSet` thread safe by using a thread-local wrapper. This only affects operations that delete specific files, which are mostly in tests. Spark deletes using an expression and using partition tuples. User-facing operations that are affected are `RewriteFiles` and `SnapshotManager` (when cherry-picking a dynamic overwrite commit). `OverwriteFiles` also exposes the method, but it is only used in tests. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
