[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7642: [HUDI-5534] Optimizing Bloom Index lookup when using Bloom Filters from Metadata Table

via GitHub Wed, 25 Jan 2023 14:37:11 -0800


alexeykudinkin commented on code in PR #7642:
URL: https://github.com/apache/hudi/pull/7642#discussion_r1087254735



##########
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##########
@@ -227,7 +235,11 @@ public Map<Pair<String, String>, BloomFilter> 
getBloomFilters(final List<Pair<St
         if (bloomFilterMetadata.isPresent()) {
           if (!bloomFilterMetadata.get().getIsDeleted()) {
             
ValidationUtils.checkState(fileToKeyMap.containsKey(entry.getLeft()));
-            final ByteBuffer bloomFilterByteBuffer = 
bloomFilterMetadata.get().getBloomFilter();
+            // NOTE: We have to duplicate the [[ByteBuffer]] object here since:
+            //        - Reading out [[ByteBuffer]] mutates its state
+            //        - [[BloomFilterMetadata]] could be re-used, and hence 
have to stay immutable
+            final ByteBuffer bloomFilterByteBuffer =
+                bloomFilterMetadata.get().getBloomFilter().duplicate();

Review Comment:
   Each file has unique Bloom Filter but we're caching the records now



##########
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##########
@@ -123,8 +126,10 @@ public List<String> getAllPartitionPaths() throws 
IOException {
         throw new HoodieMetadataException("Failed to retrieve list of 
partition from metadata", e);
       }
     }
-    return new FileSystemBackedTableMetadata(getEngineContext(), hadoopConf, 
dataBasePath.toString(),
-        metadataConfig.shouldAssumeDatePartitioning()).getAllPartitionPaths();
+
+    FileSystemBackedTableMetadata fileSystemBackedTableMetadata =
+        createFileSystemBackedTableMetadata();
+    return fileSystemBackedTableMetadata.getAllPartitionPaths();

Review Comment:
   This changes are to avoid duplication (needed to update the way we fetch 
`hadoopConf`)



##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileReaderBase.java:
##########
@@ -37,7 +37,7 @@ abstract class HoodieAvroFileReaderBase implements 
HoodieAvroFileReader {
   @Override
   public ClosableIterator<HoodieRecord<IndexedRecord>> 
getRecordIterator(Schema readerSchema, Schema requestedSchema) throws 
IOException {
     ClosableIterator<IndexedRecord> iterator = 
getIndexedRecordIterator(readerSchema, requestedSchema);
-    return new MappingIterator<>(iterator, data -> unsafeCast(new 
HoodieAvroIndexedRecord(data)));
+    return new CloseableMappingIterator<>(iterator, data -> unsafeCast(new 
HoodieAvroIndexedRecord(data)));

Review Comment:
   I think i know how this crippled in. Let me clean it out



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7642: [HUDI-5534] Optimizing Bloom Index lookup when using Bloom Filters from Metadata Table

Reply via email to