prashantwason commented on a change in pull request #1687:
URL: https://github.com/apache/hudi/pull/1687#discussion_r440406417
##########
File path:
hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java
##########
@@ -66,4 +67,21 @@
return new HoodieParquetWriter<>(instantTime, path, parquetConfig, schema,
sparkTaskContextSupplier);
}
+
+ private static <T extends HoodieRecordPayload, R extends IndexedRecord>
HoodieStorageWriter<R> newHFileStorageWriter(
+ String instantTime, Path path, HoodieWriteConfig config, Schema schema,
HoodieTable hoodieTable,
+ SparkTaskContextSupplier sparkTaskContextSupplier) throws IOException {
+
+ BloomFilter filter = createBloomFilter(config);
Review comment:
Nope. We are serializing the HoodieBloomFilter to bytes and saving this
as a metadata block in HFile. HFile has support for adding custom named blocks
of data which are loaded on demand (unlike Parquet Footer which is always
read).
HFile has its own interface BloomFilters which can also be implemented by
HoodieBloomFilter. But that sounds overkill to me.
As more base file formats are added, I think we should focus on common
functionality so that we prevent code duplication.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]