[
https://issues.apache.org/jira/browse/KYLIN-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
pengfei.zhan updated KYLIN-5745:
--------------------------------
Fix Version/s: 5.0-beta
(was: 5.0.0)
> The historical garbage cleanup task was not completed, causing the subsequent
> scheduled garbage cleanup task cannot be executed normally
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KYLIN-5745
> URL: https://issues.apache.org/jira/browse/KYLIN-5745
> Project: Kylin
> Issue Type: Bug
> Affects Versions: 5.0-beta
> Reporter: zhong.zhu
> Assignee: zhong.zhu
> Priority: Major
> Fix For: 5.0-beta
>
>
> {*}Problem description{*}:
> Timed garbage cleanup operation cannot be completed successfully
> {*}Background{*}:
> The customer found that Kylin has a large number of small files occupying
> hdfs storage, we need to clean up, we check the customer's environment and
> found that the timed garbage cleanup has not been completed properly, has
> been timeout!
> *Troubleshooting:*
> After the check, it is found that the customer's garbage clearing is
> triggered for the first time in the morning of 4.6 after Kylin is restarted
> on the night of 4.5. After this clearing operation is triggered, the thread
> of query history has been deleted since then. As a result, subsequent
> periodic garbage clearing tasks cannot be completed
> Delete 2,000 rows of data at a time, one of the customer's projects need to
> delete 550,000 query history, look at the kylin.log record, delete
> time-consuming because of table locking problems lead to a delete operation
> even reached more than 20 minutes!
> The following record is that the main thread of garbage collection is waiting
> for the query history cleaning to complete, but the query history cleaning
> has not been completed, and then the main thread timeout and exit.
> {code:shell}
> 2023-04-06T00:00:00,015 INFO [RoutineOpsWorker-287] service.ScheduleService
> : execute task MetadataBackup with remaining time: 14399995 ms
> 2023-04-06T00:01:52,649 INFO [RoutineOpsWorker-287] service.ScheduleService
> : execute task QueryHistoriesCleanup with remaining time: 14287361 ms
> ...
> 2023-04-06T04:00:00,012 WARN [DefaultTaskScheduler-3]
> service.ScheduleService : Routine task execution timeout
> java.util.concurrent.TimeoutException: null
> at java.util.concurrent.FutureTask.get(FutureTask.java:205)
> ~[?:1.8.0_242]
> at
> org.apache.kylin.rest.service.ScheduleService.executeTask(ScheduleService.java:107)
> ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
> at
> org.apache.kylin.rest.service.ScheduleService.routineTask(ScheduleService.java:77)
> ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
> at
> org.apache.kylin.rest.service.ScheduleService$$FastClassBySpringCGLIB$$afbfc46c.invoke(<generated>)
> ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
> {code}
> The following record is until the latest time provided by the log, after 9:00
> pm the query history is still processing deletion, not with the termination
> of the main thread
> {code:shell}
> 2023-04-06T00:08:43,015 DEBUG [QueryHistoryCleanWorker-23145]
> QueryHistoryMapper.selectByProject : <== Total: 12
> 2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145]
> util.QueryHisStoreUtil : Query histories of project<CPIC_FRP> is less than
> the maximum limit, so skip it.
> 2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145]
> util.QueryHisStoreUtil : Query histories of project<CXAIMA> is less than the
> maximum limit, so skip it.
> 2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145]
> util.QueryHisStoreUtil : Query histories of project<CXCDC> is less than the
> maximum limit, so skip it.
> 2023-04-06T00:08:43,016 INFO [QueryHistoryCleanWorker-23145]
> util.QueryHisStoreUtil : Query histories of project<CXCRMS> is less than the
> maximum limit, so skip it.
> 2023-04-06T00:08:43,017 INFO [QueryHistoryCleanWorker-23145]
> util.QueryHisStoreUtil : Start to delete query histories that are beyond max
> size for project<CXCZH>, records:1551669
> ...
> 2023-04-06T09:03:54,974 INFO [QueryHistoryCleanWorker-23145]
> query.JdbcQueryHistoryStore : Delete 2000 row query history for project
> [CXCZH] takes 938060 ms
> 2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145]
> QueryHistoryMapper.delete : ==> Preparing: delete from
> ke4_instance_query_history_realization where query_time < ? and project_name
> = ?
> 2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145]
> QueryHistoryMapper.delete : ==> Parameters: 1678863450091(Long), CXCZH(String)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)