[
https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375682#comment-15375682
]
Siddharth Seth commented on TEZ-3274:
-------------------------------------
Haven't looked at this in sometime. Is this being used with
MRInputSplitDistributor, and the initial parallelism set on the specific
vertex. I don't think using a Root Input along with a ShuffleInput on the same
vertex will work with MRInputAMSplitGenerator since parallelism is setup at
runtime. Shuffle tasks will see a value of -1 if the initialization takes time.
I believe we never really focused on this case, and if it showed up - it would
need to be handled via a custom VertexManager. If such a manager were to exist
- how would the data distribution be handled? There's different splits for the
MRInput and partitions on the Shuffle side - how are they mapped?
> Vertex with MRInput and shuffle input does not respect slow start
> -----------------------------------------------------------------
>
> Key: TEZ-3274
> URL: https://issues.apache.org/jira/browse/TEZ-3274
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
>
> Vertices with shuffle input and MRInput choose RootInputVertexManager (and
> not ShuffleVertexManager) and start containers and tasks immediately. In this
> scenario, resources can be wasted since they do not respect
> tez.shuffle-vertex-manager.min-src-fraction
> tez.shuffle-vertex-manager.max-src-fraction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)