satishkotha commented on a change in pull request #1320: [HUDI-571] Add min/max
headers on archived files
URL: https://github.com/apache/incubator-hudi/pull/1320#discussion_r378484755
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
##########
@@ -182,8 +183,11 @@ private String getMetadataKey(String action) {
//read the avro blocks
while (reader.hasNext()) {
HoodieAvroDataBlock blk = (HoodieAvroDataBlock) reader.next();
- // TODO If we can store additional metadata in datablock, we can
skip parsing records
- // (such as startTime, endTime of records in the block)
+ if (isDataOutOfRange(blk, filter)) {
Review comment:
No. In the current implementation, the first block tracks range for entire
fire. In some cases there are lot of archived files and its much faster to skip
entire file when looking at older ranges.
The overhead of storing metadata on every block seemed high. By default, we
are grouping 10 records into one block. That translates to 10KB in size. Header
on every block with min/max is adding 40 bytes overhead. So, 0.4% overhead
seemed high to me. Let me know if you think we can ignore overhead. I can move
this to per block
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services