[
https://issues.apache.org/jira/browse/TEZ-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-1635:
----------------------------------
Attachment: tez_smb_1_successful_job.log
tez_smb_1_hung_job.log
Attaching the successful and hung job details for tez_smb_1.q with additional
logs in org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.
tez_smb_1.q: (DAG snapshot is already attached)
- Map_1 [Map_2] - MultiMRInput, initializer=MRInputAMSplitGenerator
- Map_1 [s1] - MRInputLegacy, initializer=MRInputAMSplitGenerator
- (Map_1 [Map_2], Map_1 [s1]) --> Map_1[MapTezProcessor]
- Map_1[MapTezProcessor] --> Map_1[out_Map_1] MROutput
Map_1 vertexManager is org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex
Successful job:
===============
- When CustomPartitionVertex.onRootVertexInitialized() gets called for "s1",
CustomPartionVertex.processAllEvents() is invoked which internally populates
bucketToTaskMap datastructure.
- When CustomPartitionVertex.onRootVertexInitialized() gets called for "Map_2",
CustomPartitionVertex.processAllSideEvents() is invoked which depends on
bucketToTaskMap to generate the InputDataInformationEvent.
Failure/hung job:
===============
- CPV.onRootVertexInitialized() gets called for "Map_2" first. This ends up
calling CPV.processAllSideEvents(). Since bucketToTaskMap structure is empty,
it would *not* generate any InputDataInformationEvent.
- CPV.onRootVertexInitialized() gets called for "s1" later.
In this case, events pertaining to MultiMRInput (Map_2) is never sent to Tez
from CustomPartitionVertex.
> Dag gets stuck intermittently
> -----------------------------
>
> Key: TEZ-1635
> URL: https://issues.apache.org/jira/browse/TEZ-1635
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.0
> Reporter: Vikram Dixit K
> Priority: Blocker
> Attachments: Screen Shot 2014-10-05 at 9.46.31 AM.png,
> syslog_dag_1412109415326_0002_10.gz, tez_smb_1_hung_job.log,
> tez_smb_1_successful_job.log
>
>
> Attaching logs for the dag.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)