Jzjsnow commented on PR #3520: URL: https://github.com/apache/amoro/pull/3520#issuecomment-2890510403
> I think the `table_optimizing_process` might only need a globally unique ID as the primary key, without necessarily including `table_name`. However, the current ID generation rules do indeed carry a risk of duplication. Thanks to the tip about the process_id unique key, I have since researched the use of the `table_optimizing_process` table in depth. I found that `process_id` is critical to apply as a primary key, such as canceling the optimizing process (from the front end) and updating the taskRuntime etc. I agree that using timestamps as `process_id` is better for cleanup process performance, and I think the solution to the current problem of having duplicate timestamps is clearer and more concise: the duplicate `process_id` occurs due to the highly concurrent execution of the `TableOptimizingProcess#planInternal()`. In the latest commit, i generate the `process_id` with a timestamp before the asynchronization to ensure the uniqueness of the `process_id`, PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
