zhong.zhu created KYLIN-5745:
--------------------------------

             Summary: The historical garbage cleanup task was not completed, 
causing the subsequent scheduled garbage cleanup task cannot be executed 
normally
                 Key: KYLIN-5745
                 URL: https://issues.apache.org/jira/browse/KYLIN-5745
             Project: Kylin
          Issue Type: Bug
    Affects Versions: 5.0-beta
            Reporter: zhong.zhu
            Assignee: zhong.zhu
             Fix For: 5.0.0


{*}Problem description{*}: 
Timed garbage cleanup operation cannot be completed successfully


{*}Background{*}: 
The customer found that Kylin has a large number of small files occupying hdfs 
storage, we need to clean up, we check the customer's environment and found 
that the timed garbage cleanup has not been completed properly, has been 
timeout!


*Troubleshooting:*
After the check, it is found that the customer's garbage clearing is triggered 
for the first time in the morning of 4.6 after KE is restarted on the night of 
4.5. After this clearing operation is triggered, the thread of query history 
has been deleted since then. As a result, subsequent periodic garbage clearing 
tasks cannot be completed

Delete 2,000 rows of data at a time, one of the customer's projects need to 
delete 550,000 query history, look at the kylin.log record, delete 
time-consuming because of table locking problems lead to a delete operation 
even reached more than 20 minutes!

The following record is that the main thread of garbage collection is waiting 
for the query history cleaning to complete, but the query history cleaning has 
not been completed, and then the main thread timeout and exit.


{code:shell}
2023-04-06T00:00:00,015 INFO  [RoutineOpsWorker-287] service.ScheduleService : 
execute task MetadataBackup with remaining time: 14399995 ms
2023-04-06T00:01:52,649 INFO  [RoutineOpsWorker-287] service.ScheduleService : 
execute task QueryHistoriesCleanup with remaining time: 14287361 ms
...
2023-04-06T04:00:00,012 WARN  [DefaultTaskScheduler-3] service.ScheduleService 
: Routine task execution timeout
java.util.concurrent.TimeoutException: null
        at java.util.concurrent.FutureTask.get(FutureTask.java:205) 
~[?:1.8.0_242]
        at 
org.apache.kylin.rest.service.ScheduleService.executeTask(ScheduleService.java:107)
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
        at 
org.apache.kylin.rest.service.ScheduleService.routineTask(ScheduleService.java:77)
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
        at 
org.apache.kylin.rest.service.ScheduleService$$FastClassBySpringCGLIB$$afbfc46c.invoke(<generated>)
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
{code}

The following record is until the latest time provided by the log, after 9:00 
pm the query history is still processing deletion, not with the termination of 
the main thread
{code:shell}
2023-04-06T00:08:43,015 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.selectByProject : <==      Total: 12
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project<CPIC_FRP> is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project<CXAIMA> is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project<CXCDC> is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project<CXCRMS> is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,017 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Start to delete query histories that are beyond max 
size for project<CXCZH>, records:1551669
...
2023-04-06T09:03:54,974 INFO  [QueryHistoryCleanWorker-23145] 
query.JdbcQueryHistoryStore : Delete 2000 row query history for project [CXCZH] 
takes 938060 ms
2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.delete : ==>  Preparing: delete from 
ke4_instance_query_history_realization where query_time < ? and project_name = ?
2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.delete : ==> Parameters: 1678863450091(Long), CXCZH(String)
{code}


 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to