kasured commented on issue #5843: URL: https://github.com/apache/hudi/issues/5843#issuecomment-1154047895
We are not using clustering, so I am not sure if these far-in-the-future commits are caused by that. We have both inline and async clustering disabled "hoodie.clustering.inline" = "false" and "hoodie.clustering.async.enabled" = "false" Again, we are trying to understand how this commits happen to appear in the timeline. I might be thinking of the impact of `HoodieActiveTimeline` using not thread-safe `java.text.SimpleDateFormat` when generating the timeline instant I am seeing that 0.10.0 has been reworked to remove that issue in https://issues.apache.org/jira/browse/HUDI-2831 and https://github.com/apache/hudi/pull/4073 But we are on the version 0.9.0 which might be very much susceptible to this subtle threat. Even in the single writer scenario it looks like that COMMIT_FORMATTER can be accessed by multiple threads. In our scenario we are creating multiple Streaming Queries in a single spark application which can amplify such concurrency issues -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
