yihua commented on code in PR #7642:
URL: https://github.com/apache/hudi/pull/7642#discussion_r1087236760
##########
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##########
@@ -227,7 +235,11 @@ public Map<Pair<String, String>, BloomFilter>
getBloomFilters(final List<Pair<St
if (bloomFilterMetadata.isPresent()) {
if (!bloomFilterMetadata.get().getIsDeleted()) {
ValidationUtils.checkState(fileToKeyMap.containsKey(entry.getLeft()));
- final ByteBuffer bloomFilterByteBuffer =
bloomFilterMetadata.get().getBloomFilter();
+ // NOTE: We have to duplicate the [[ByteBuffer]] object here since:
+ // - Reading out [[ByteBuffer]] mutates its state
+ // - [[BloomFilterMetadata]] could be re-used, and hence
have to stay immutable
+ final ByteBuffer bloomFilterByteBuffer =
+ bloomFilterMetadata.get().getBloomFilter().duplicate();
Review Comment:
Supposedly each file has its unique bloom filter. Why could it be reused
somehow?
##########
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##########
@@ -123,8 +126,10 @@ public List<String> getAllPartitionPaths() throws
IOException {
throw new HoodieMetadataException("Failed to retrieve list of
partition from metadata", e);
}
}
- return new FileSystemBackedTableMetadata(getEngineContext(), hadoopConf,
dataBasePath.toString(),
- metadataConfig.shouldAssumeDatePartitioning()).getAllPartitionPaths();
+
+ FileSystemBackedTableMetadata fileSystemBackedTableMetadata =
+ createFileSystemBackedTableMetadata();
+ return fileSystemBackedTableMetadata.getAllPartitionPaths();
Review Comment:
these changes seem not notable.
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileReaderBase.java:
##########
@@ -37,7 +37,7 @@ abstract class HoodieAvroFileReaderBase implements
HoodieAvroFileReader {
@Override
public ClosableIterator<HoodieRecord<IndexedRecord>>
getRecordIterator(Schema readerSchema, Schema requestedSchema) throws
IOException {
ClosableIterator<IndexedRecord> iterator =
getIndexedRecordIterator(readerSchema, requestedSchema);
- return new MappingIterator<>(iterator, data -> unsafeCast(new
HoodieAvroIndexedRecord(data)));
+ return new CloseableMappingIterator<>(iterator, data -> unsafeCast(new
HoodieAvroIndexedRecord(data)));
Review Comment:
Can we move these changes not related to Bloom Index into a separate PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]