Zhangshunyu commented on issue #6758: URL: https://github.com/apache/hudi/issues/6758#issuecomment-1258848379
@alexeykudinkin @yihua Thanks for your reply. For example if we set file group count as 10 and each hfile has more than 10 million lines of col_index info, then we will use 10 tasks to scan hfile and each task get the records by prefix from 10million lines. but if we deivide them by time (year, month, day, etc.), if the query only hit 1day, then we can just read the hfile of that day, whose size will small. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
