[ 
https://issues.apache.org/jira/browse/TEZ-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1499:
----------------------------

    Description: In the current join example, the inputs of JoinProcessor is 
unordered so that it will always need to load one input into memory, and stream 
another input. This only fit for the case when one dataset is small enough to 
fit into memory ( even use no-broadcast, memory may not be enough ).  So I'd 
like to add another join example that make the inputs of JoinProcessor is 
ordered. ( using OrderedPartitionedKVEdgeConfig ). This kind of join could been 
used when both of the 2 datasets are large.  (was: In the current join example, 
the inputs of JoinProcessor is unordered so that it will always need to load 
one input into memory, and stream another input. This only fit for the case 
when one dataset is small enough to fit into memory ( even use no-broadcast, 
memory may not be enough ).  So I'd like to add another join example that make 
the inputs of JoinProcessor is ordered. ( using OrderedPartitionedKVEdgeConfig 
). This kind of join could been used when both of the 2 dataset is large.)

> Add OrderedJoinExample to tez-examples
> --------------------------------------
>
>                 Key: TEZ-1499
>                 URL: https://issues.apache.org/jira/browse/TEZ-1499
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>
> In the current join example, the inputs of JoinProcessor is unordered so that 
> it will always need to load one input into memory, and stream another input. 
> This only fit for the case when one dataset is small enough to fit into 
> memory ( even use no-broadcast, memory may not be enough ).  So I'd like to 
> add another join example that make the inputs of JoinProcessor is ordered. ( 
> using OrderedPartitionedKVEdgeConfig ). This kind of join could been used 
> when both of the 2 datasets are large.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to