[jira] [Commented] (KYLIN-3021) Check MapReduce job failed reason and include the diagnostics into email notification

Zhong Yanghong (JIRA) Sun, 23 Dec 2018 19:24:47 -0800


    [ 
https://issues.apache.org/jira/browse/KYLIN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728112#comment-16728112
 ]


Zhong Yanghong commented on KYLIN-3021:
---------------------------------------

By this fix, we can get more error info. One example is as follows:
{code}
Counters: 33
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=1203010
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2261
                HDFS: Number of bytes written=4661
                HDFS: Number of read operations=58
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=18
        Job Counters 
                Failed map tasks=4
                Killed map tasks=1
                Launched map tasks=11
                Other local map tasks=9
                Rack-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=1038159
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=346053
                Total vcore-seconds taken by all map tasks=346053
                Total megabyte-seconds taken by all map tasks=974485248
        Map-Reduce Framework
                Map input records=6
                Map output records=0
                Input split bytes=1484
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=598
                CPU time spent (ms)=26830
                Physical memory (bytes) snapshot=3199647744
                Virtual memory (bytes) snapshot=25723592704
                Total committed heap usage (bytes)=8977383424
        File Input Format Counters 
                Bytes Read=732
        File Output Format Counters 
                Bytes Written=720
Job Diagnostics:Task failed task_1544857205985_80511_m_000007
Job failed as tasks failed. failedMaps:1 failedReduces:0

Failure task Diagnostics:
Error: java.lang.IllegalStateException: Table snapshot should be no greater 
than 300 MB, but 
...
...
...
        at 
org.apache.kylin.dict.lookup.SnapshotManager.checkBeforeBuild(SnapshotManager.java:141)
        at 
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshotOnly(SnapshotManager.java:166)
        at 
org.apache.kylin.engine.mr.steps.BuildDictionaryMapper.buildSnapshot(BuildDictionaryMapper.java:290)
        at 
org.apache.kylin.engine.mr.steps.BuildDictionaryMapper.doCleanup(BuildDictionaryMapper.java:191)
        at org.apache.kylin.engine.mr.KylinMapper.cleanup(KylinMapper.java:71)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild\$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
{code}

> Check MapReduce job failed reason and include the diagnostics into email 
> notification
> -------------------------------------------------------------------------------------
>
>                 Key: KYLIN-3021
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3021
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>            Priority: Major
>             Fix For: v2.6.0
>
>
> the current kylin.log and failed job email notification, we do not have the 
> detailed error info that why the map reduce jobs are failed. We just log  "no 
> counters for job" or "Counters: 0".
>  
> 2017-08-03 18:24:10,197 WARN  [pool-10-thread-17] common.HadoopCmdOutput:90 : 
> no counters for job job_1497957612021_709431
>  
> 2017-08-03 15:08:02,351 DEBUG [pool-10-thread-3] common.HadoopCmdOutput:95 : 
> Counters: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KYLIN-3021) Check MapReduce job failed reason and include the diagnostics into email notification

Reply via email to