Re: [PR] [AMORO-3252] Fix primary key duplicate exception when concurrently inserting table optimization process entries into database [amoro]

via GitHub Mon, 12 May 2025 00:27:21 -0700


Jzjsnow commented on PR #3520:
URL: https://github.com/apache/amoro/pull/3520#issuecomment-2871212139


   > Hi, Thanks for driving this.
   > 
   > After I listed all SQL related to `table_optimizing_process`, I found that 
the condition of some SQL does not contain `table_name`, which may cause poor 
performance, like: 
https://github.com/apache/amoro/blob/master/amoro-ams/src/main/java/org/apache/amoro/server/persistence/mapper/OptimizingMapper.java#L147
   > 
   > I think the `table_optimizing_process` might only need a globally unique 
ID as the primary key, without necessarily including `table_name`. However, the 
current ID generation rules do indeed carry a risk of duplication.
   > 
   > Currently, we use the`currentTimestamp` generation rule, which I 
understand is for easier cleanup operations. However, perhaps we should 
optimize the current cleanup logic, for example, by partitioning this table and 
using DROP PARTITION to perform cleanup more efficiently. #3445 is following 
the cleaning improvement issue.
   
   @zhoujinsong Thanks for the viewpoint. I think expiring all tables during 
cleanup optimization can greatly improve optimization performance as discussed 
in #3445. On the other hand, I think it is still necessary to keep the 
`tableid` in the query sql related to tables 
`table_optimizing_process`/`task_runtime` / `optimizing_task_quota` because we 
may still need to process the optimization information of a single table. 
   
   For example, as mentioned in #3445, we'd better clean up all the 
optimization information of a table immediately when it is dropped, and in 
#3554 when we clear orphan table information during table service initializing, 
we still need tableid to filter out the optimizing entries related to the 
invalid tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [AMORO-3252] Fix primary key duplicate exception when concurrently inserting table optimization process entries into database [amoro]

Reply via email to