guanziyue edited a comment on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-994684618


   > Hi vinothchandar:
   Concurrent writing to HoodieParquetWriter occurs at following code
   
https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java#L103
   When speculation is triggered, we firstly call mergeHandle.close which calls 
parquetWriter close method. At the same time, boundedInMemoryExecutor is still 
working, so write method of mergeHandle is called at same time which call write 
method of parquetWriter.
   And parquetWriter does have a state which is not thread safe. It holds 
BytesInput which is used as internal data storage in parquet column format, it 
is not thread safe and its life cycle is controlled by parquetWriter. Such data 
structure is reused within JVM. So parquet writer cannot be written after 
closed to ensure relevant data structure will not be written into after being 
reset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to