Manoj Govindassamy created HUDI-3301:
----------------------------------------

             Summary: Metadata table inline reading should be stateless and 
thread safe
                 Key: HUDI-3301
                 URL: https://issues.apache.org/jira/browse/HUDI-3301
             Project: Apache Hudi
          Issue Type: Task
            Reporter: Manoj Govindassamy
            Assignee: Ethan Guo
             Fix For: 0.11.0


Metadata table inline reading (enable.full.scan.log.files = false) today alters 
instance member fields and not thread safe.

 

When the inline reading is enabled, HoodieMetadataMergedLogRecordReader doesn't 
do full read of log and base files and doesn't fill in the ExternalSpillableMap 
records cache. Each getRecordsByKeys() thereby will re-read the log and base 
files by design. But the issue here is this reading alters the instance members 
and the filled in records are relevant only for that request. Any concurrent 
getRecordsByKeys() is also modifying the member variable leading to NPE.

 

To avoid this, a temporary fix of making getRecordsByKeys() a synchronized 
method has been pushed to master. But this fix doesn't solve all usecases. We 
need to make the whole class stateless and thread safe for inline reading.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to