Jzjsnow commented on PR #3520: URL: https://github.com/apache/amoro/pull/3520#issuecomment-2871212139
> Hi, Thanks for driving this. > > After I listed all SQL related to `table_optimizing_process`, I found that the condition of some SQL does not contain `table_name`, which may cause poor performance, like: https://github.com/apache/amoro/blob/master/amoro-ams/src/main/java/org/apache/amoro/server/persistence/mapper/OptimizingMapper.java#L147 > > I think the `table_optimizing_process` might only need a globally unique ID as the primary key, without necessarily including `table_name`. However, the current ID generation rules do indeed carry a risk of duplication. > > Currently, we use the`currentTimestamp` generation rule, which I understand is for easier cleanup operations. However, perhaps we should optimize the current cleanup logic, for example, by partitioning this table and using DROP PARTITION to perform cleanup more efficiently. #3445 is following the cleaning improvement issue. @zhoujinsong Thanks for the viewpoint. I think expiring all tables during cleanup optimization can greatly improve optimization performance as discussed in #3445. On the other hand, I think it is still necessary to keep the `tableid` in the query sql related to tables `table_optimizing_process`/`task_runtime` / `optimizing_task_quota` because we may still need to process the optimization information of a single table. For example, as mentioned in #3445, we'd better clean up all the optimization information of a table immediately when it is dropped, and in #3554 when we clear orphan table information during table service initializing, we still need tableid to filter out the optimizing entries related to the invalid tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
