guanziyue edited a comment on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-994684618
> Hi vinothchandar: Concurrent writing to HoodieParquetWriter occurs at following code https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java#L103 When speculation is triggered, we firstly call mergeHandle.close which calls parquetWriter close method. At the same time, boundedInMemoryExecutor is still working, so write method of mergeHandle is called at same time which call write method of parquetWriter. And parquetWriter does have a state which is not thread safe. It holds BytesInput which is used as internal data storage in parquet column format, it is not thread safe and its life cycle is controlled by parquetWriter. Such data structure is reused within JVM. So parquet writer cannot be written after closed to ensure relevant data structure will not be written into after being reset. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
