[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

GitBox Mon, 15 Jun 2020 12:46:21 -0700


prashantwason commented on a change in pull request #1687:
URL: https://github.com/apache/hudi/pull/1687#discussion_r440406417




##########
File path: 
hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java
##########
@@ -66,4 +67,21 @@
 
     return new HoodieParquetWriter<>(instantTime, path, parquetConfig, schema, 
sparkTaskContextSupplier);
   }
+
+  private static <T extends HoodieRecordPayload, R extends IndexedRecord> 
HoodieStorageWriter<R> newHFileStorageWriter(
+      String instantTime, Path path, HoodieWriteConfig config, Schema schema, 
HoodieTable hoodieTable,
+      SparkTaskContextSupplier sparkTaskContextSupplier) throws IOException {
+
+    BloomFilter filter = createBloomFilter(config);

Review comment:
       Nope. We are serializing the HoodieBloomFilter to bytes and saving this 
as a metadata block in HFile. HFile has support for adding custom named blocks 
of data which are loaded on demand (unlike Parquet Footer which is always 
read). 
   
   HFile has its own interface BloomFilters which can also be implemented by 
HoodieBloomFilter. But that sounds overkill to me. 
   
   As more base file formats are added, I think we should focus on common 
functionality so that we prevent code duplication.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

Reply via email to