aokolnychyi commented on a change in pull request #3287:
URL: https://github.com/apache/iceberg/pull/3287#discussion_r733189769
##########
File path: core/src/main/java/org/apache/iceberg/deletes/Deletes.java
##########
@@ -107,6 +108,29 @@ public static StructLikeSet
toEqualitySet(CloseableIterable<StructLike> eqDelete
}
}
+ public static Roaring64Bitmap toPositionBitMap(CharSequence dataLocation,
+ CloseableIterable<? extends StructLike> deleteFile) {
+ return toPositionBitMap(dataLocation, ImmutableList.of(deleteFile));
+ }
+
+ public static <T extends StructLike> Roaring64Bitmap
toPositionBitMap(CharSequence dataLocation,
+ List<CloseableIterable<T>> deleteFiles) {
+ DataFileFilter<T> locationFilter = new DataFileFilter<>(dataLocation);
+ List<CloseableIterable<Long>> positions = Lists.transform(deleteFiles,
deletes ->
+ CloseableIterable.transform(locationFilter.filter(deletes), row ->
(Long) POSITION_ACCESSOR.get(row)));
+ return toPositionBitMap(CloseableIterable.concat(positions));
+ }
+
+ public static Roaring64Bitmap toPositionBitMap(CloseableIterable<Long>
posDeletes) {
+ try (CloseableIterable<Long> deletes = posDeletes) {
+ Roaring64Bitmap bitmap = new Roaring64Bitmap();
Review comment:
Also, we should take into account that we are computing delete positions
for a single file. So the expectation is that it is a reasonable number of them
as the user runs compactions.
To sum up, I'd test `RoaringBitmap` for sparse deletes. If it performs well,
use it as it will be more efficient if we have many deletes. If it occupies a
lot of memory for that use case, let's just use `Set<Long>`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]