prashantwason commented on a change in pull request #1687:
URL: https://github.com/apache/hudi/pull/1687#discussion_r435499599
##########
File path:
hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java
##########
@@ -66,4 +67,21 @@
return new HoodieParquetWriter<>(instantTime, path, parquetConfig, schema,
sparkTaskContextSupplier);
}
+
+ private static <T extends HoodieRecordPayload, R extends IndexedRecord>
HoodieStorageWriter<R> newHFileStorageWriter(
+ String instantTime, Path path, HoodieWriteConfig config, Schema schema,
HoodieTable hoodieTable,
+ SparkTaskContextSupplier sparkTaskContextSupplier) throws IOException {
+
+ BloomFilter filter = createBloomFilter(config);
Review comment:
HFile allows adding BloomFilters using an interface. They are not always
present (i.e. there is no AvroWriteSupport kind of implementation).
I think there are some benefits to using Hoodie's BloomFilter:
1. If we use HBase's implementation of BloomFilter, we will have to convert
it to HoodieBloom filter as Hoodie code uses HoodieBloomFilter in functions
2. Advances in Hoodie's Bloom filter (compression, dynamic sizing) etc will
not be available to HFile format
3. Common interface across all base file formats so easier to maintain in
the long run and easier to update
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]