[GitHub] [hudi] yihua commented on a diff in pull request #7642: [HUDI-5534] Optimizing Bloom Index lookup when using Bloom Filters from Metadata Table

via GitHub Wed, 25 Jan 2023 14:15:39 -0800


yihua commented on code in PR #7642:
URL: https://github.com/apache/hudi/pull/7642#discussion_r1087236760



##########
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##########
@@ -227,7 +235,11 @@ public Map<Pair<String, String>, BloomFilter> 
getBloomFilters(final List<Pair<St
         if (bloomFilterMetadata.isPresent()) {
           if (!bloomFilterMetadata.get().getIsDeleted()) {
             
ValidationUtils.checkState(fileToKeyMap.containsKey(entry.getLeft()));
-            final ByteBuffer bloomFilterByteBuffer = 
bloomFilterMetadata.get().getBloomFilter();
+            // NOTE: We have to duplicate the [[ByteBuffer]] object here since:
+            //        - Reading out [[ByteBuffer]] mutates its state
+            //        - [[BloomFilterMetadata]] could be re-used, and hence 
have to stay immutable
+            final ByteBuffer bloomFilterByteBuffer =
+                bloomFilterMetadata.get().getBloomFilter().duplicate();

Review Comment:
   Supposedly each file has its unique bloom filter.  Why could it be reused 
somehow?



##########
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##########
@@ -123,8 +126,10 @@ public List<String> getAllPartitionPaths() throws 
IOException {
         throw new HoodieMetadataException("Failed to retrieve list of 
partition from metadata", e);
       }
     }
-    return new FileSystemBackedTableMetadata(getEngineContext(), hadoopConf, 
dataBasePath.toString(),
-        metadataConfig.shouldAssumeDatePartitioning()).getAllPartitionPaths();
+
+    FileSystemBackedTableMetadata fileSystemBackedTableMetadata =
+        createFileSystemBackedTableMetadata();
+    return fileSystemBackedTableMetadata.getAllPartitionPaths();

Review Comment:
   these changes seem not notable.



##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileReaderBase.java:
##########
@@ -37,7 +37,7 @@ abstract class HoodieAvroFileReaderBase implements 
HoodieAvroFileReader {
   @Override
   public ClosableIterator<HoodieRecord<IndexedRecord>> 
getRecordIterator(Schema readerSchema, Schema requestedSchema) throws 
IOException {
     ClosableIterator<IndexedRecord> iterator = 
getIndexedRecordIterator(readerSchema, requestedSchema);
-    return new MappingIterator<>(iterator, data -> unsafeCast(new 
HoodieAvroIndexedRecord(data)));
+    return new CloseableMappingIterator<>(iterator, data -> unsafeCast(new 
HoodieAvroIndexedRecord(data)));

Review Comment:
   Can we move these changes not related to Bloom Index into a separate PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on a diff in pull request #7642: [HUDI-5534] Optimizing Bloom Index lookup when using Bloom Filters from Metadata Table

Reply via email to