Gopal V created TEZ-2104:
----------------------------

             Summary: A CrossProductEdge which produces synthetic cross-product 
parallelism
                 Key: TEZ-2104
                 URL: https://issues.apache.org/jira/browse/TEZ-2104
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Gopal V


Instead of producing duplicate data for the synthetic cross-product, to fit 
into partitions, the amount of net IO can be vastly reduced by a special 
purpose cross-product data movement edge.

The Shuffle edge routes each partition's output to a single reducer, while the 
cross-product edge routes it into a matrix of reducers without actually 
duplicating the disk data.

A partitioning scheme with 3 partitions on the lhs and rhs of a join operation 
can be routed into 9 reducers by performing a cross-product similar to 

(1,2,3) x (a,b,c) = [(1,a), (1,b), (1,c), (2,a), (2,b) ...]

This turns a single task cross-product model into a distributed cross product.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to