[ 
https://issues.apache.org/jira/browse/TEZ-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091706#comment-14091706
 ] 

Siddharth Seth commented on TEZ-1318:
-------------------------------------

Looks good mostly. Are there restrictions on when parallelism will be inferred 
from the DataSourceDescriptor if there are multiple sources. That, along with 
VMs needing to set parallelism should be in the Javadocs.
I think we should set the default to be based on a single node, rather than 
what is expected on a cluster. For a cluster, this can always be bumped up via 
tez-site.xml, but when trying out Tez - a single node cluster if far more 
likely.

Since many of the examples don't specify an explicit values, does this affect 
the unit tests - in terms of how many tasks/threads run in parallel ? A san 
value may need to be set in the configuration used by MiniTezCluster to ensure 
it runs the desired number of tasks.

> Simplify Vertex constructor
> ---------------------------
>
>                 Key: TEZ-1318
>                 URL: https://issues.apache.org/jira/browse/TEZ-1318
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>            Priority: Blocker
>         Attachments: TEZ-1318.1.patch
>
>
> In favor of picking a default specified in tez-site, with a potential 
> fallback to the minimum allocation in YARN.
> Typically, this value is cluster specific at least for simple jobs. If more 
> advanced users need to change this - it can be done via setters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to