[ 
https://issues.apache.org/jira/browse/PIG-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3795:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to tez branch. Thank you Rohini for the review!

> Parallelism specified by user is not honored if default parallelism is set to 
> a higher value
> --------------------------------------------------------------------------------------------
>
>                 Key: PIG-3795
>                 URL: https://issues.apache.org/jira/browse/PIG-3795
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: tez-branch
>
>         Attachments: PIG-3795-1.patch
>
>
> Let's say you have a query like this-
> {code}
> set default_parallel 200;
> x = cogroup foo by a, bar by b parallel 10;
> y = join x by c, z by d;
> {code}
> I would expect that cogroup has a parallel of 10 while join has a parallel of 
> 200. However, the parallel of cogroup is also set to 200.
> Here is where the default parallelism overwrites the user-specified 
> parallelism.
> {code:title=TezCompiler.java#L390}
>         if (op.getRequestedParallelism() > 
> curTezOp.getRequestedParallelism()) {
>             curTezOp.setRequestedParallelism(op.getRequestedParallelism());
>         }
> {code}
> In the above example, "op" is POLocalRearrange of join, and "curTezOp" is 
> TezOperator that contains both POPackage of cogroup and POLocalRearrange of 
> join.
> Here is what the TezOperator looks like-
> {code}
> |   join_allocs_mop: Local Rearrange[tuple]{long}(false) - scope-134    ->   
> null
> |   |   |
> |   |   Project[long][10] - scope-135
> |   |
> |   |---join_allocs_subscrn: New For Each(true)[bag] - scope-75
> |       |   |
> |       |   POUserFunc(org.apache.pig.scripting.jython.JythonFunction)[bag] - 
> scope-70
> |       |   |
> |       |   |---POUserFunc(org.apache.pig.builtin.TOTUPLE)[tuple] - scope-69
> |       |       |
> |       |       |---Project[bag][0] - scope-67
> |       |       |
> |       |       |---RelationToExpressionProject[bag][*] - scope-68
> |       |           |
> |       |           |---ab_exp_63_day_subscrn_d_ordered: POSort[bag]() - 
> scope-74
> |       |               |   |
> |       |               |   Project[chararray][9] - scope-73
> |       |               |
> |       |               |---Project[bag][1] - scope-72
> |       |
> |       |---New For Each(false,false)[bag] - scope-66
> |           |   |
> |           |   Project[bag][1] - scope-62
> |           |   |
> |           |   Project[bag][2] - scope-64
> |           |
> |           |---abNonmemberByCustomer: Package(Packager)[tuple]{long} - 
> scope-57
> {code}
> The problem is that the parallelism of root (POPackage) is overwritten by 
> that of leaves (POLocalRearrange) because  the latter (200) > the former (10).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to