hudi-bot opened a new issue, #15000:
URL: https://github.com/apache/hudi/issues/15000

   When inline reading is enabled, that is 
hoodie.metadata.enable.full.scan.log.files = false, 
MetadataMergedLogRecordReader doesn't cache the file listings records via the 
ExternalSpillableMap. So, every file listing will lead to re-reading of 
metadata files partition log and base files. Since files partition is less in 
size, even when inline reading is enabled, the TimelineServer should construct 
the FSViewManager with inline reading disabled for metadata files partition. 
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-3300
   - Type: Bug
   - Epic: https://issues.apache.org/jira/browse/HUDI-1292
   - Fix version(s):
     - 1.1.0
   
   
   ---
   
   
   ## Comments
   
   22/Jan/22 07:40;manojg;Verified the time line server - it has reuse enabled 
and the readers opened are retained and the files listed are cached at the 
higher level there by serving faster requests. But for all other cases, when 
the {{reuse}} readers are false, we should not be caching the reader handles at 
all. Today we cache the readers and then close the readers towards the end. 
There is a possibility of multiple non {{reuse}} requests coming at the same 
time and using the same readers. Fullscan/Inline scan is a totally different 
problem and i am not going there. In the worst case two non {{reuse}} requests 
can use the same reader and its latest file slice retrieved at the time of 
caching the readers. If for any reasons the file slice happens to move forward 
because of concurrent upserts, the readers wouldn't know about this and would 
only read the old file slices.
   
    
   
   Siva: 
   to my understanding, you are mostly right in your explanation. but tell me 
something. when 2nd reader is coming through, if latest commit time hasn't 
changed, why would there be new updates to the file slice.  On the contrary, If 
there was updates to latest file slice (with new log appends), latest commit 
time would have updated and so caller should have re-initialized the file 
system view right. or does this re-initialize happen only incase of timeline 
server and at other places we don't keep refreshing the fileSystemView.I know 
this is very tricky. we definitely need to get a good understanding of every 
nitty gritty detail here.
    ;;;
   
   ---
   
   23/Aug/22 01:34;xushiyan;[~guoyihua] : can be closed after verification;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to