Jzjsnow commented on PR #3520:
URL: https://github.com/apache/amoro/pull/3520#issuecomment-2890510403

   > I think the `table_optimizing_process` might only need a globally unique 
ID as the primary key, without necessarily including `table_name`. However, the 
current ID generation rules do indeed carry a risk of duplication.
   
   Thanks to the tip about the process_id unique key, I have since researched 
the use of the `table_optimizing_process` table in depth. I found that 
`process_id` is critical to apply as a primary key, such as canceling the 
optimizing process (from the front end) and updating the taskRuntime etc.
   
   I agree that using timestamps as `process_id` is better for cleanup process 
performance, and I think the solution to the current problem of having 
duplicate timestamps is clearer and more concise: the duplicate `process_id` 
occurs due to the highly concurrent execution of the 
`TableOptimizingProcess#planInternal()`. 
   
   In the latest commit, i generate the `process_id` with a timestamp before 
the asynchronization to ensure the uniqueness of the `process_id`, PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to