[
https://issues.apache.org/jira/browse/TEZ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074652#comment-14074652
]
Bikas Saha commented on TEZ-1107:
---------------------------------
This was originally opened by Daniel because Pig was setting an initial
parallelism which could then be increased later on. The approach then changed
to setting initial parallelism to -1 and setting the correct parallelism later
on. So Pig should not need this feature any longer, at least for the original
use case.
In general, this does not need an API change since the parallelism is specified
in the API already. Just that support is currently not there when the
parallelism actually increases.
> Support increase of parallelism of vertex in case of custom partitioner
> -----------------------------------------------------------------------
>
> Key: TEZ-1107
> URL: https://issues.apache.org/jira/browse/TEZ-1107
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Daniel Dai
> Assignee: Bikas Saha
>
> Current VertexManagerPlugin/EdgeManager mechanism support decrease of
> parallelism of a vertex, but increase parallelism is not supported. In
> general, we need to do repartition to increase the parallelism. However, in
> my simplified case, the proceeding vertex is using a custom partitioner which
> is able to partition to the final parallelism, repartitioning is not needed.
> However, I hit an exception from sorter:
> : Caused by: java.io.IOException: Illegal partition for
> Null: false index: 0 53.8 (2), TotalPartitions: 2
> : at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.collect(DefaultSorter.java:208)
> : at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.write(DefaultSorter.java:185)
>
> : at
> org.apache.tez.runtime.library.output.OnFileSortedOutput$1.write(OnFileSortedOutput.java:111)
>
> : at
> org.apache.pig.backend.hadoop.executionengine.tez.POIdentityInOutTez.getNextTuple(POIdentityInOutTez.java:148)
> : ... 8 more
> While increase parallelism in general is harder, increase parallelism with a
> custom partitioner might be easier to fix.
--
This message was sent by Atlassian JIRA
(v6.2#6252)