[
https://issues.apache.org/jira/browse/TEZ-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321156#comment-14321156
]
Bikas Saha commented on TEZ-2104:
---------------------------------
The edge manager plugin is a user defined artifact and as such does not have to
be present in the Tez project. Is it the case that this model of cross-product
is portable across projects like Hive and Pig and thus can share a common home
in Tez?
> A CrossProductEdge which produces synthetic cross-product parallelism
> ---------------------------------------------------------------------
>
> Key: TEZ-2104
> URL: https://issues.apache.org/jira/browse/TEZ-2104
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Gopal V
>
> Instead of producing duplicate data for the synthetic cross-product, to fit
> into partitions, the amount of net IO can be vastly reduced by a special
> purpose cross-product data movement edge.
> The Shuffle edge routes each partition's output to a single reducer, while
> the cross-product edge routes it into a matrix of reducers without actually
> duplicating the disk data.
> A partitioning scheme with 3 partitions on the lhs and rhs of a join
> operation can be routed into 9 reducers by performing a cross-product similar
> to
> (1,2,3) x (a,b,c) = [(1,a), (1,b), (1,c), (2,a), (2,b) ...]
> This turns a single task cross-product model into a distributed cross product.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)