Re: Map-Reduce without sorting

Arun C Murthy Wed, 19 Oct 2011 20:09:49 -0700

I'm talking about the Hadoop impl of map-task.

On Oct 19, 2011, at 3:45 PM, <[email protected]> 
<[email protected]> wrote:


> Arun,
> 
> From the Tenzing paper:
> 
> <quote>
> Hash table based aggregation is common in RDBMS sys-
> tems. However, it is impossible to implement eciently
> on the basic MapReduce framework, since the reducer al-
> ways unnecessarily sorts the data by key. We enhanced the
> MapReduce framework to relax this restriction so that all
> values for the same key end up in the same reducer shard,
> but not necessarily in the same Reduce() call. This made
> it possible to completely avoid the sorting step on the re-
> ducer and implement a pure hash table based aggregation
> on MapReduce. This can have a signicant impact on the
> performance on certain types of queries. Due to optimizer
> limitations, the user must explicitly indicate that they want
> hash based aggregation. A query using hash-based aggre-
> gation will fail if there is not enough memory for the hash
> table.
> </quote>
> 
> So, you need sorting by partition anyway, which is exactly what would
> happen if I set key comparator to return equals always.
> 
> - milind
> 
> 
> 
> 
> On 10/19/11 3:26 PM, "Arun C Murthy" <[email protected]> wrote:
> 
>> Milind the map-side sort uses the partiion as the primary key. So, you
>> still sort.
>> 
>> See MR-1639 for more details.
>> 
>> On Oct 19, 2011, at 3:22 PM, <[email protected]> wrote:
>> 
>>> How is that different from specifying a comparator that always returns
>>> that k1 and k2 are equal regardless of k1 and k2 ? So, you will get only
>>> partitioning, and not sorting.
>>> 
>>> - Milind
>>> 
>>> 
>>> On 10/19/11 2:42 PM, "Zheng Shao" <[email protected]> wrote:
>>> 
>>>> Google's Tenzing paper mentioned that they modified MR to make sorting
>>>> in
>>>> reducer optional:
>>>> 
>>>> http://static.googleusercontent.com/external_content/untrusted_dlcp/rese
>>>> ar
>>>> ch.google.com/en/us/pubs/archive/37200.pdf
>>>> 
>>>> Is there any plan to support that in MR 2.0?
>>>> 
>>>> Zheng
>>> 
>> 
>> 
>

Re: Map-Reduce without sorting

Reply via email to