[
https://issues.apache.org/jira/browse/TEZ-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091706#comment-14091706
]
Siddharth Seth commented on TEZ-1318:
-------------------------------------
Looks good mostly. Are there restrictions on when parallelism will be inferred
from the DataSourceDescriptor if there are multiple sources. That, along with
VMs needing to set parallelism should be in the Javadocs.
I think we should set the default to be based on a single node, rather than
what is expected on a cluster. For a cluster, this can always be bumped up via
tez-site.xml, but when trying out Tez - a single node cluster if far more
likely.
Since many of the examples don't specify an explicit values, does this affect
the unit tests - in terms of how many tasks/threads run in parallel ? A san
value may need to be set in the configuration used by MiniTezCluster to ensure
it runs the desired number of tasks.
> Simplify Vertex constructor
> ---------------------------
>
> Key: TEZ-1318
> URL: https://issues.apache.org/jira/browse/TEZ-1318
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Priority: Blocker
> Attachments: TEZ-1318.1.patch
>
>
> In favor of picking a default specified in tez-site, with a potential
> fallback to the minimum allocation in YARN.
> Typically, this value is cluster specific at least for simple jobs. If more
> advanced users need to change this - it can be done via setters.
--
This message was sent by Atlassian JIRA
(v6.2#6252)