[
https://issues.apache.org/jira/browse/KYLIN-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551760#comment-17551760
]
Ibrar Ahmed commented on KYLIN-5186:
------------------------------------
[~mukvin] for the above mentioned step following are the outputs:
*1)* mysql> SHOW VARIABLES LIKE 'max_connections';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 312 |
+-----------------+-------+
*2)* mysql properties looks fine as we are use AWS RDS.
*3)* ulimit -a output on kylin machine:
*[kylin@kylin ~]$ ulimit -a*
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 253079
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 253079
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
*NOTE:* in parallel it also changes error type following is also an ERROR Code
in kylin logs if that helps to understand more:
{code:java}
2022-06-05 11:22:38,496 INFO [FetcherRunner 1426791806-37]
threadpool.DefaultFetcherRunner:111 : Job Fetcher: 14 should running, 14 actual
running, 0 stopped, 0 ready, 13805 already succeed, 2 error, 257 discarded, 0
others
22022-06-05 11:22:45,359 ERROR [Scheduler 869401759 Job
c64fa851-c0e8-7c96-69d4-64132150f1de-103] common.HadoopJobStatusChecker:58 :
error check status
3java.io.IOException: Job status not available
4 at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:331)
5 at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:338)
6 at
org.apache.kylin.engine.mr.common.HadoopJobStatusChecker.checkStatus(HadoopJobStatusChecker.java:38)
7 at
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:181)
8 at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
9 at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
10 at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
11 at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
12 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
13 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
14 at java.lang.Thread.run(Thread.java:748)
152022-06-05 11:22:45,389 ERROR [Scheduler 869401759 Job
c64fa851-c0e8-7c96-69d4-64132150f1de-103] common.MapReduceExecutable:259 :
error execute MapReduceExecutable{id=c64fa851-c0e8-7c96-69d4-64132150f1de-10,
name=Build N-Dimension Cuboid : level 4, state=RUNNING}
16java.lang.NullPointerException
17 at org.apache.hadoop.mapreduce.Job.getTrackingURL(Job.java:363)
18 at
org.apache.kylin.engine.mr.common.HadoopCmdOutput.getInfo(HadoopCmdOutput.java:66)
19 at
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:199)
20 at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
21 at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
22 at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
23 at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
24 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
25 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
26 at java.lang.Thread.run(Thread.java:748)
272022-06-05 11:22:45,393 INFO [Scheduler 869401759 Job
c64fa851-c0e8-7c96-69d4-64132150f1de-103] execution.AbstractExecutable:539 :
Pause 30000 milliseconds before retry
282022-06-05 11:23:04,737 INFO [Scheduler 869401759 Job
8b16d316-88c9-c0e6-15aa-e92d2c981020-94] cube.CubeManager:988 : Promoting cube
CUBE[name=NAV_OUTCOMES_CUBE_07092021], new segment
NAV_OUTCOMES_CUBE_07092021[FULL_BUILD], to remove segments
[NAV_OUTCOMES_CUBE_07092021[FULL_BUILD]] {code}
> few jobs do not get registered to a cluster for processing by scheduler
> -----------------------------------------------------------------------
>
> Key: KYLIN-5186
> URL: https://issues.apache.org/jira/browse/KYLIN-5186
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: v3.0.1
> Environment: EMR
> Reporter: Ibrar Ahmed
> Priority: Critical
>
> i am using kylin version:
> *Version: Apache kylin 3.0.1*
> Commit: 638a1eb68f257366d240105b33e5eea3bfa4dbf3
> i have *30+ kylin cube build jobs but* in a week one or two jobs are not
> getting registered in the cluster even though S{*}CHEDULER tries to send jobs
> to the cluster{*},
> its gets this error usually on step: *Build N-Dimension Cuboid :*
> get following error:
> {code:java}
> // code placeholder
> 1113546:2022-05-20 00:26:04,873 ERROR [Scheduler 1065041011 Job
> 7df4f62c-b58f-b959-8ae9-ee0e5b2438fb-112] common.HadoopJobStatusChecker:58 :
> error check status
> 1113559:2022-05-20 00:26:04,907 ERROR [Scheduler 1065041011 Job
> 7df4f62c-b58f-b959-8ae9-ee0e5b2438fb-112] common.MapReduceExecutable:259 :
> error execute MapReduceExecutable{id=7df4f62c-b58f-b959-8ae9-ee0e5b2438fb-11,
> name=Build N-Dimension Cuboid : level 6, state=RUNNING}
> 2022-05-17 23:45:28,237 ERROR [Scheduler 1065041011 Job
> 4cc6e7c9-98e6-3f3c-db26-5e45e0455cab-129] common.MapReduceExecutable:259 :
> error execute MapReduceExecutable{id=4cc6e7c9-98e6-3f3c-db26-5e45e0455cab-05,
> name=Build Base Cuboid, state=RUNNING}
> java.lang.RuntimeException:
> org.apache.kylin.job.exception.PersistentException: java.io.IOException:
> com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link
> failure
> The last packet sent successfully to the server was 0 milliseconds ago. The
> driver has not received any packets from the server.
> at
> org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:178)
> at
> org.apache.kylin.job.execution.AbstractExecutable.getOutput(AbstractExecutable.java:389)
> at
> org.apache.kylin.job.execution.AbstractExecutable.isDiscarded(AbstractExecutable.java:515)
>
> at
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:179)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kylin.job.exception.PersistentException:
> java.io.IOException: com.mysql.cj.jdbc.exceptions.CommunicationsException:
> Communications link failure
> The last packet sent successfully to the server was 0 milliseconds ago. The
> driver has not received any packets from the server.
> at org.apache.kylin.job.dao.ExecutableDao.getJobOutput(ExecutableDao.java:407)
> at
> org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:173)
> ... 10 more
> Caused by: java.io.IOException:
> com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link
> failure {code}
> {*}Note{*}: if i *pause and resume* the jobs it gets registerd and {*}works
> fine{*}.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)