hudi-bot opened a new issue, #15000: URL: https://github.com/apache/hudi/issues/15000
When inline reading is enabled, that is hoodie.metadata.enable.full.scan.log.files = false, MetadataMergedLogRecordReader doesn't cache the file listings records via the ExternalSpillableMap. So, every file listing will lead to re-reading of metadata files partition log and base files. Since files partition is less in size, even when inline reading is enabled, the TimelineServer should construct the FSViewManager with inline reading disabled for metadata files partition. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-3300 - Type: Bug - Epic: https://issues.apache.org/jira/browse/HUDI-1292 - Fix version(s): - 1.1.0 --- ## Comments 22/Jan/22 07:40;manojg;Verified the time line server - it has reuse enabled and the readers opened are retained and the files listed are cached at the higher level there by serving faster requests. But for all other cases, when the {{reuse}} readers are false, we should not be caching the reader handles at all. Today we cache the readers and then close the readers towards the end. There is a possibility of multiple non {{reuse}} requests coming at the same time and using the same readers. Fullscan/Inline scan is a totally different problem and i am not going there. In the worst case two non {{reuse}} requests can use the same reader and its latest file slice retrieved at the time of caching the readers. If for any reasons the file slice happens to move forward because of concurrent upserts, the readers wouldn't know about this and would only read the old file slices. Siva: to my understanding, you are mostly right in your explanation. but tell me something. when 2nd reader is coming through, if latest commit time hasn't changed, why would there be new updates to the file slice. On the contrary, If there was updates to latest file slice (with new log appends), latest commit time would have updated and so caller should have re-initialized the file system view right. or does this re-initialize happen only incase of timeline server and at other places we don't keep refreshing the fileSystemView.I know this is very tricky. we definitely need to get a good understanding of every nitty gritty detail here. ;;; --- 23/Aug/22 01:34;xushiyan;[~guoyihua] : can be closed after verification;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
