[ 
https://issues.apache.org/jira/browse/HUDI-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480362#comment-17480362
 ] 

Manoj Govindassamy commented on HUDI-3300:
------------------------------------------

Verified the time line server - it has reuse enabled and the readers opened are 
retained and the files listed are cached at the higher level there by serving 
faster requests. But for all other cases, when the {{reuse}} readers are false, 
we should not be caching the reader handles at all. Today we cache the readers 
and then close the readers towards the end. There is a possibility of multiple 
non {{reuse}} requests coming at the same time and using the same readers. 
Fullscan/Inline scan is a totally different problem and i am not going there. 
In the worst case two non {{reuse}} requests can use the same reader and its 
latest file slice retrieved at the time of caching the readers. If for any 
reasons the file slice happens to move forward because of concurrent upserts, 
the readers wouldn't know about this and would only read the old file slices.

 

Siva: 
to my understanding, you are mostly right in your explanation. but tell me 
something. when 2nd reader is coming through, if latest commit time hasn't 
changed, why would there be new updates to the file slice.  On the contrary, If 
there was updates to latest file slice (with new log appends), latest commit 
time would have updated and so caller should have re-initialized the file 
system view right. or does this re-initialize happen only incase of timeline 
server and at other places we don't keep refreshing the fileSystemView.I know 
this is very tricky. we definitely need to get a good understanding of every 
nitty gritty detail here.
 

> Timeline server FSViewManager should avoid inline reading for metadata file 
> partition
> -------------------------------------------------------------------------------------
>
>                 Key: HUDI-3300
>                 URL: https://issues.apache.org/jira/browse/HUDI-3300
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Manoj Govindassamy
>            Assignee: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> When inline reading is enabled, that is 
> hoodie.metadata.enable.full.scan.log.files = false, 
> MetadataMergedLogRecordReader doesn't cache the file listings records via the 
> ExternalSpillableMap. So, every file listing will lead to re-reading of 
> metadata files partition log and base files. Since files partition is less in 
> size, even when inline reading is enabled, the TimelineServer should 
> construct the FSViewManager with inline reading disabled for metadata files 
> partition. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to