[ https://issues.apache.org/jira/browse/KYLIN-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16521777#comment-16521777 ]
Shaofeng SHI commented on KYLIN-3282: ------------------------------------- Hi Xingxing, do you have insight on whythe job was disappeared in "Monitor view"? If the job can appear, then you can discard it. Is there other error in the log? > hbase timeout cause the endless status. > --------------------------------------- > > Key: KYLIN-3282 > URL: https://issues.apache.org/jira/browse/KYLIN-3282 > Project: Kylin > Issue Type: Bug > Components: Job Engine > Affects Versions: v2.3.0 > Reporter: readme_kylin > Priority: Major > > ri Mar 09 12:52:07 GMT+08:00 2018, > RpcRetryingCaller\{globalStartTime=1520571112216, pause=100, retries=1}, > java.io.IOException: Call to QZ140/10.0.0.140:16020 failed on local > exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8030361, > waitTime=15002, operationTimeout=15000 expired. > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:1233) > at > org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:311) > at > org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:305) > at > org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:291) > at > org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:271) > at > org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:88) > at > org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:216) > at > org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:480) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:161) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > 2018-03-09 12:52:10,191 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:100 : > 1th retries for onExecuteFinished fails due to {} > java.lang.IllegalStateException: Overwriting conflict > /execute_output/499477a7-4c1a-4c5a-8d4a-0b3218a58dca-13, expect old TS > 1520571099067, but it is 1520571112216 > at > org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:316) > at > org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:305) > at > org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:291) > at > org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:271) > at > org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:88) > at > org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:216) > at > org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:480) > at > org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:490) > at > org.apache.kylin.job.execution.AbstractExecutable.addExtraInfo(AbstractExecutable.java:403) > at > org.apache.kylin.job.execution.AbstractExecutable.setEndTime(AbstractExecutable.java:415) > at > org.apache.kylin.job.execution.AbstractExecutable.onExecuteFinished(AbstractExecutable.java:121) > at > org.apache.kylin.job.execution.AbstractExecutable.onExecuteFinishedWithRetry(AbstractExecutable.java:98) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:175) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2018-03-09 12:52:10,193 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:164 : > error running Executable: CubingJob\{id=499477a7-4c1a-4c5a-8d4a-0b3218a58dca, > name=BUILD CUBE - android_download_model_1_2_cube_1_3 - > 20180309000000_20180310000000 - GMT+08:00 2018-03-09 12:28:58, state=RUNNING} > 2018-03-09 12:52:10,193 INFO [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:157 : > Retry 1 > 2018-03-09 12:52:10,313 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:108 : > There shouldn't be a running subtask[jobId: > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-13, jobName: Build N-Dimension Cuboid : > level 7], > it might cause endless state, will retry to fetch subtask's state. > 2018-03-09 12:52:10,414 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 1 times retry, it's state is still RUNNING > 2018-03-09 12:52:10,525 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 2 times retry, it's state is still RUNNING > 2018-03-09 12:52:10,626 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 3 times retry, it's state is still RUNNING > 2018-03-09 12:52:10,737 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 4 times retry, it's state is still RUNNING > 2018-03-09 12:52:10,839 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 5 times retry, it's state is still RUNNING > 2018-03-09 12:52:10,945 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 6 times retry, it's state is still RUNNING > 2018-03-09 12:52:11,047 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 7 times retry, it's state is still RUNNING > 2018-03-09 12:52:11,157 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 8 times retry, it's state is still RUNNING > 2018-03-09 12:52:11,260 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 9 times retry, it's state is still RUNNING > 2018-03-09 12:52:11,362 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : > With 10 times retry, it's state is still RUNNING > 2018-03-09 12:52:11,363 ERROR [Scheduler 9772827 Job > 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:195 : > Parent task: BUILD CUBE - android_download_model_1_2_cube_1_3 - > 20180309000000_20180310000000 - GMT+08:00 2018-03-09 12:28:58 is finished, > but it's subtask: Build N-Dimension Cuboid : level 7's state is still RUNNING > , mark parent task failed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)