wujinhu created EAGLE-920:
-----------------------------

             Summary: mr failed job trouble shooting
                 Key: EAGLE-920
                 URL: https://issues.apache.org/jira/browse/EAGLE-920
             Project: Eagle
          Issue Type: Improvement
          Components: App::Job Performance Monitor
    Affects Versions: v0.5.0
            Reporter: wujinhu
            Assignee: wujinhu
             Fix For: v0.5.0


We will follow below steps when we find a failed mr job.
1. get error category distribution of the job via api
query=TaskAttemptErrorCategoryService[@site="sandbox" and 
@jobId="job_1486726244016_162594"]<@errorCategory>{count}
2. get error category - error message mapping and failed task attempts list
query=JobErrorMappingService[@site="sandbox" and 
@jobId="job_1486726244016_162594" and 
@errorCategory="java.lang.RuntimeException"]
3. dive into one task attempt
query=TaskAttemptExecutionService[@site="sandbox" and 
@taskAttemptId="attempt_1486726244016_162594_m_002451_1"]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to