[
https://issues.apache.org/jira/browse/TEZ-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Zhang updated TEZ-1499:
----------------------------
Summary: Add SortMergeJoinExample to tez-examples (was: Add
OrderedJoinExample to tez-examples)
> Add SortMergeJoinExample to tez-examples
> ----------------------------------------
>
> Key: TEZ-1499
> URL: https://issues.apache.org/jira/browse/TEZ-1499
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: Tez-1499-2.patch, Tez-1499.patch
>
>
> In the current join example, the inputs of JoinProcessor is unordered so that
> it will always need to load one input into memory, and stream another input.
> This only fit for the case when one dataset is small enough to fit into
> memory ( even use no-broadcast, memory may not be enough ). So I'd like to
> add another join example that make the inputs of JoinProcessor is ordered. (
> using OrderedPartitionedKVEdgeConfig ). This kind of join could been used
> when both of the 2 datasets are large.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)