rdblue opened a new pull request #1165:
URL: https://github.com/apache/iceberg/pull/1165


   `MergingSnapshotProducer` was recently refactored to separate out manifest 
filtering and merging so it could be reused for delete files. That refactor 
also updated the filter to use a `CharSequenceSet` instead of a `HashSet` and 
`CharSequenceWrapper`. The `CharSequenceSet` embeds the wrapper and is easier 
to use, but this introduced a bug where multiple threads using the same 
`CharSequenceSet` would use the same seq wrapper in `contains`. This was 
causing deletes of specific files to miss data files if the wrapper was reused 
while testing a file that should be deleted was in the delete set.
   
   The solution is to make `CharSequenceSet` thread safe by using a 
thread-local wrapper.
   
   This only affects operations that delete specific files, which are mostly in 
tests. Spark deletes using an expression and using partition tuples. 
User-facing operations that are affected are `RewriteFiles` and 
`SnapshotManager` (when cherry-picking a dynamic overwrite commit). 
`OverwriteFiles` also exposes the method, but it is only used in tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to