[
https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376355#comment-15376355
]
Mithun Radhakrishnan commented on TEZ-3336:
-------------------------------------------
Ok, here's what's happening:
{{HiveSplitGenerator}} is only in play if Hive uses the {{HiveInputFormat}}
when generating splits on the AM. It's not built to handle
{{CombineHiveInputFormat}} at all. I suppose regrouping grouped splits is
silly.
If the user chooses {{CombineHiveInputFormat}}, then Hive's
[{{DagUtils.createVertex()}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L612-L618]
does the following:
{code:java|title=DagUtils.java#L612-L618|borderStyle=solid}
// Not HiveInputFormat, or a custom VertexManager will take care of
grouping splits
if (vertexHasCustomInput) {
dataSource =
MultiMRInput.createConfigBuilder(conf,
inputFormatClass).groupSplits(false).build();
} else {
dataSource =
MRInputLegacy.createConfigBuilder(conf,
inputFormatClass).groupSplits(false).build();
}
{code}
So Hive delegates to Tez's {{MRInputLegacy.createConfigBuilder()}}, which
eventually puts {{MRInput}} and {{MRInputAMSplitGenerator}} in play.
I'm still curious about the nature of the events sent to
{{MRInputAMSplitGenerator}}, and who's sending them. That'll help convince me
that this is indeed a Hive bug. :]
> Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
> -------------------------------------------------------------------
>
> Key: TEZ-3336
> URL: https://issues.apache.org/jira/browse/TEZ-3336
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.1
> Reporter: Jason Lowe
>
> When Hive does a map-side join it can generate a DAG where a vertex has two
> inputs, one from an upstream task and another using MRInputAMSplitGenerator.
> If it takes a while for MRInputAMSplitGenerator to compute the splits and one
> of the tasks for the other upstream vertex completes then the job can fail
> with an error since MRInputAMSplitGenerator does not expect to receive any
> events.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)