[
https://issues.apache.org/jira/browse/KYLIN-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016446#comment-17016446
]
ASF GitHub Bot commented on KYLIN-4328:
---------------------------------------
ggKe commented on pull request #1064: KYLIN-4328 Add cache for succeed jobid in
defaultFetcherRunner
URL: https://github.com/apache/kylin/pull/1064
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> When hbase and kylin that are not in the same IDC and found that the build
> task became very slow during scheduling.
> --------------------------------------------------------------------------------------------------------------------
>
> Key: KYLIN-4328
> URL: https://issues.apache.org/jira/browse/KYLIN-4328
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Affects Versions: v2.2.0, v3.0.0, v2.6.3
> Environment: Centos 7.4
> hbase 1.2.4
> hive 1.1.1
> hadoop 2.7.2
> Reporter: GuKe
> Assignee: GuKe
> Priority: Major
>
> When hbase and kylin that are not in the same IDC and found that the build
> task became very slow during scheduling.
> We found that it was caused by the following part of the code.
> The method getExecutableManager().GetAllJobIdsInCache() will read all of
> jobid,There are currently more than 35,000 jobs in our server,and each jobid
> accesses hbase at least twice to read the job state.
> While that the most of jobs are succeed status.Those status won't change.
> When kylin and hbase services are in the same IDC each visit to hbase Network
> Latency is less than 1 ms.
> However it takes more than 5 ms to access hbase each time across the IDC so
> the delay caused by accessing hbase is considerable.
> It takes a long time for scheduling task to run.
> So we can add a cache to hold the id of the successful job at the first time
> of the service start.
> After we modified the code the run time reduced from 10 minutes to 20 seconds.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)