[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1320: [HUDI-571] Add min/max headers on archived files

GitBox Wed, 12 Feb 2020 12:06:58 -0800

satishkotha commented on a change in pull request #1320: [HUDI-571] Add min/max 
headers on archived files
URL: https://github.com/apache/incubator-hudi/pull/1320#discussion_r378484755


 ##########
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
 ##########
 @@ -182,8 +183,11 @@ private String getMetadataKey(String action) {
           //read the avro blocks
           while (reader.hasNext()) {
             HoodieAvroDataBlock blk = (HoodieAvroDataBlock) reader.next();
-            // TODO If we can store additional metadata in datablock, we can 
skip parsing records
-            // (such as startTime, endTime of records in the block)
+            if (isDataOutOfRange(blk, filter)) {
 
 Review comment:
   No. In the current implementation, the first block tracks range for entire 
fire. In some cases there are lot of archived files and its much faster to skip 
entire file when looking at older ranges. 
   
   The overhead of storing metadata on every block seemed high. By default, we 
are grouping 10 records into one block. That translates to 10KB in size. Header 
on every block with min/max is adding 40 bytes overhead. So, 0.4% overhead 
seemed  high to me. Let me know if you think we can ignore overhead. I can move 
this to per block

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1320: [HUDI-571] Add min/max headers on archived files

Reply via email to