[ 
https://issues.apache.org/jira/browse/KYLIN-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832725#comment-17832725
 ] 

ASF subversion and git services commented on KYLIN-5745:
--------------------------------------------------------

Commit 180ff07afeaf64da8e565b7aa5882c3a89f1c268 in kylin's branch 
refs/heads/kylin5 from fengguangyuan
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=180ff07afe ]

KYLIN-5745 Using a global thread pool to clean underlying storages

1. Using a global thread pool to clean underlying storages;
2. Launching cleaning tasks in the local thread and to ignore
FileNotFoundException while collecting HDFS files.

Co-authored-by: Guangyuan Feng <guangyuan.f...@kyligence.io>


> The historical garbage cleanup task was not completed, causing the subsequent 
> scheduled garbage cleanup task cannot be executed normally
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-5745
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5745
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: 5.0-beta
>            Reporter: zhong.zhu
>            Assignee: zhong.zhu
>            Priority: Major
>             Fix For: 5.0.0
>
>
> {*}Problem description{*}: 
> Timed garbage cleanup operation cannot be completed successfully
> {*}Background{*}: 
> The customer found that Kylin has a large number of small files occupying 
> hdfs storage, we need to clean up, we check the customer's environment and 
> found that the timed garbage cleanup has not been completed properly, has 
> been timeout!
> *Troubleshooting:*
> After the check, it is found that the customer's garbage clearing is 
> triggered for the first time in the morning of 4.6 after Kylin is restarted 
> on the night of 4.5. After this clearing operation is triggered, the thread 
> of query history has been deleted since then. As a result, subsequent 
> periodic garbage clearing tasks cannot be completed
> Delete 2,000 rows of data at a time, one of the customer's projects need to 
> delete 550,000 query history, look at the kylin.log record, delete 
> time-consuming because of table locking problems lead to a delete operation 
> even reached more than 20 minutes!
> The following record is that the main thread of garbage collection is waiting 
> for the query history cleaning to complete, but the query history cleaning 
> has not been completed, and then the main thread timeout and exit.
> {code:shell}
> 2023-04-06T00:00:00,015 INFO  [RoutineOpsWorker-287] service.ScheduleService 
> : execute task MetadataBackup with remaining time: 14399995 ms
> 2023-04-06T00:01:52,649 INFO  [RoutineOpsWorker-287] service.ScheduleService 
> : execute task QueryHistoriesCleanup with remaining time: 14287361 ms
> ...
> 2023-04-06T04:00:00,012 WARN  [DefaultTaskScheduler-3] 
> service.ScheduleService : Routine task execution timeout
> java.util.concurrent.TimeoutException: null
>       at java.util.concurrent.FutureTask.get(FutureTask.java:205) 
> ~[?:1.8.0_242]
>       at 
> org.apache.kylin.rest.service.ScheduleService.executeTask(ScheduleService.java:107)
>  ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
>       at 
> org.apache.kylin.rest.service.ScheduleService.routineTask(ScheduleService.java:77)
>  ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
>       at 
> org.apache.kylin.rest.service.ScheduleService$$FastClassBySpringCGLIB$$afbfc46c.invoke(<generated>)
>  ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
> {code}
> The following record is until the latest time provided by the log, after 9:00 
> pm the query history is still processing deletion, not with the termination 
> of the main thread
> {code:shell}
> 2023-04-06T00:08:43,015 DEBUG [QueryHistoryCleanWorker-23145] 
> QueryHistoryMapper.selectByProject : <==      Total: 12
> 2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
> util.QueryHisStoreUtil : Query histories of project<CPIC_FRP> is less than 
> the maximum limit, so skip it.
> 2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
> util.QueryHisStoreUtil : Query histories of project<CXAIMA> is less than the 
> maximum limit, so skip it.
> 2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
> util.QueryHisStoreUtil : Query histories of project<CXCDC> is less than the 
> maximum limit, so skip it.
> 2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
> util.QueryHisStoreUtil : Query histories of project<CXCRMS> is less than the 
> maximum limit, so skip it.
> 2023-04-06T00:08:43,017 INFO  [QueryHistoryCleanWorker-23145] 
> util.QueryHisStoreUtil : Start to delete query histories that are beyond max 
> size for project<CXCZH>, records:1551669
> ...
> 2023-04-06T09:03:54,974 INFO  [QueryHistoryCleanWorker-23145] 
> query.JdbcQueryHistoryStore : Delete 2000 row query history for project 
> [CXCZH] takes 938060 ms
> 2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] 
> QueryHistoryMapper.delete : ==>  Preparing: delete from 
> ke4_instance_query_history_realization where query_time < ? and project_name 
> = ?
> 2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] 
> QueryHistoryMapper.delete : ==> Parameters: 1678863450091(Long), CXCZH(String)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to