kasured commented on issue #5843:
URL: https://github.com/apache/hudi/issues/5843#issuecomment-1154047895

   We are not using clustering, so I am not sure if these far-in-the-future 
commits are caused by that. We have both inline and async clustering disabled 
"hoodie.clustering.inline" = "false" and "hoodie.clustering.async.enabled" = 
"false"
   
   Again, we are trying to understand how this commits happen to appear in the 
timeline. I might be thinking of the impact of `HoodieActiveTimeline` using not 
thread-safe `java.text.SimpleDateFormat` when generating the timeline instant
   
   I am seeing that 0.10.0 has been reworked to remove that issue in 
https://issues.apache.org/jira/browse/HUDI-2831 and 
https://github.com/apache/hudi/pull/4073 But we are on the version 0.9.0 which 
might be very much susceptible to this subtle threat. Even in the single writer 
scenario it looks like that COMMIT_FORMATTER can be accessed by multiple 
threads. In our scenario we are creating multiple Streaming Queries in a single 
spark application which can amplify such concurrency issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to