[ 
https://issues.apache.org/jira/browse/KYLIN-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552684#comment-17552684
 ] 

Ibrar Ahmed commented on KYLIN-5186:
------------------------------------

[~mukvin] yesterday i have changed value as suggested and restarted kylin :
[kylin@kylin ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 253079
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
{color:#de350b}open files                      (-n) 65536{color}
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 253079
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

 

 

*38/40 got finished fine except 2 cubes and they giving following error:*
everything as same as it was
{code:java}
BUILD CUBE - NAV_INSIGHTS_ANALYSIS_COMP_04052021_3_2          
085ef935-3f4e-88b9-7069-c62ad5348034       2022-06-10 00:17:39,006 ERROR 
[Scheduler 588496686 Job 085ef935-3f4e-88b9-7069-c62ad5348034-203] 
common.HadoopJobStatusChecker:58 : error check status
java.io.IOException: Job status not available
        at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:331)
        at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:338)
        at 
org.apache.kylin.engine.mr.common.HadoopJobStatusChecker.checkStatus(HadoopJobStatusChecker.java:38)
        at 
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:181)
        at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
        at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
        at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
        at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-06-10 00:17:39,037 ERROR [Scheduler 588496686 Job 
085ef935-3f4e-88b9-7069-c62ad5348034-203] common.MapReduceExecutable:259 : 
error execute MapReduceExecutable{id=085ef935-3f4e-88b9-7069-c62ad5348034-06, 
name=Build Base Cuboid, state=RUNNING}
java.lang.NullPointerException
        at org.apache.hadoop.mapreduce.Job.getTrackingURL(Job.java:363)
        at 
org.apache.kylin.engine.mr.common.HadoopCmdOutput.getInfo(HadoopCmdOutput.java:66)
        at 
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:199)
        at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
        at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
        at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
        at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-06-10 00:17:39,040 INFO  [Scheduler 588496686 Job 
085ef935-3f4e-88b9-7069-c62ad5348034-203] execution.AbstractExecutable:539 : 
Pause 30000 milliseconds before retry {code}

> few jobs do not get registered to a cluster for processing by scheduler
> -----------------------------------------------------------------------
>
>                 Key: KYLIN-5186
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5186
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v3.0.1
>         Environment: EMR
>            Reporter: Ibrar Ahmed
>            Priority: Critical
>
> i am using kylin version:
> *Version: Apache kylin 3.0.1*
> Commit: 638a1eb68f257366d240105b33e5eea3bfa4dbf3
> i have *30+ kylin cube build jobs but* in a week one or two jobs are not 
> getting registered in the cluster even though S{*}CHEDULER tries to send jobs 
> to the cluster{*},
> its gets this error usually on step: *Build N-Dimension Cuboid  :*
> get following error:
> {code:java}
> // code placeholder
> 1113546:2022-05-20 00:26:04,873 ERROR [Scheduler 1065041011 Job 
> 7df4f62c-b58f-b959-8ae9-ee0e5b2438fb-112] common.HadoopJobStatusChecker:58 : 
> error check status
> 1113559:2022-05-20 00:26:04,907 ERROR [Scheduler 1065041011 Job 
> 7df4f62c-b58f-b959-8ae9-ee0e5b2438fb-112] common.MapReduceExecutable:259 : 
> error execute MapReduceExecutable{id=7df4f62c-b58f-b959-8ae9-ee0e5b2438fb-11, 
> name=Build N-Dimension Cuboid : level 6, state=RUNNING} 
> 2022-05-17 23:45:28,237 ERROR [Scheduler 1065041011 Job 
> 4cc6e7c9-98e6-3f3c-db26-5e45e0455cab-129] common.MapReduceExecutable:259 : 
> error execute MapReduceExecutable{id=4cc6e7c9-98e6-3f3c-db26-5e45e0455cab-05, 
> name=Build Base Cuboid, state=RUNNING}
> java.lang.RuntimeException: 
> org.apache.kylin.job.exception.PersistentException: java.io.IOException: 
> com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link 
> failure 
> The last packet sent successfully to the server was 0 milliseconds ago. The 
> driver has not received any packets from the server.
> at 
> org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:178)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.getOutput(AbstractExecutable.java:389)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.isDiscarded(AbstractExecutable.java:515)
>  
> at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:179)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kylin.job.exception.PersistentException: 
> java.io.IOException: com.mysql.cj.jdbc.exceptions.CommunicationsException: 
> Communications link failure 
> The last packet sent successfully to the server was 0 milliseconds ago. The 
> driver has not received any packets from the server.
> at org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:407)
> at 
> org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:173)
> ... 10 more
> Caused by: java.io.IOException: 
> com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link 
> failure  {code}
> {*}Note{*}: if i *pause and resume* the jobs it gets registerd and {*}works 
> fine{*}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to