wujinhu created EAGLE-920:
-----------------------------
Summary: mr failed job trouble shooting
Key: EAGLE-920
URL: https://issues.apache.org/jira/browse/EAGLE-920
Project: Eagle
Issue Type: Improvement
Components: App::Job Performance Monitor
Affects Versions: v0.5.0
Reporter: wujinhu
Assignee: wujinhu
Fix For: v0.5.0
We will follow below steps when we find a failed mr job.
1. get error category distribution of the job via api
query=TaskAttemptErrorCategoryService[@site="sandbox" and
@jobId="job_1486726244016_162594"]<@errorCategory>{count}
2. get error category - error message mapping and failed task attempts list
query=JobErrorMappingService[@site="sandbox" and
@jobId="job_1486726244016_162594" and
@errorCategory="java.lang.RuntimeException"]
3. dive into one task attempt
query=TaskAttemptExecutionService[@site="sandbox" and
@taskAttemptId="attempt_1486726244016_162594_m_002451_1"]
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)