Arun,

>From the Tenzing paper:

<quote>
Hash table based aggregation is common in RDBMS sys-
tems. However, it is impossible to implement eciently
on the basic MapReduce framework, since the reducer al-
ways unnecessarily sorts the data by key. We enhanced the
MapReduce framework to relax this restriction so that all
values for the same key end up in the same reducer shard,
but not necessarily in the same Reduce() call. This made
it possible to completely avoid the sorting step on the re-
ducer and implement a pure hash table based aggregation
on MapReduce. This can have a signicant impact on the
performance on certain types of queries. Due to optimizer
limitations, the user must explicitly indicate that they want
hash based aggregation. A query using hash-based aggre-
gation will fail if there is not enough memory for the hash
table.
</quote>

So, you need sorting by partition anyway, which is exactly what would
happen if I set key comparator to return equals always.

- milind




On 10/19/11 3:26 PM, "Arun C Murthy" <[email protected]> wrote:

>Milind the map-side sort uses the partiion as the primary key. So, you
>still sort.
>
>See MR-1639 for more details.
>
>On Oct 19, 2011, at 3:22 PM, <[email protected]> wrote:
>
>> How is that different from specifying a comparator that always returns
>> that k1 and k2 are equal regardless of k1 and k2 ? So, you will get only
>> partitioning, and not sorting.
>> 
>> - Milind
>> 
>> 
>> On 10/19/11 2:42 PM, "Zheng Shao" <[email protected]> wrote:
>> 
>>> Google's Tenzing paper mentioned that they modified MR to make sorting
>>>in
>>> reducer optional:
>>> 
>>>http://static.googleusercontent.com/external_content/untrusted_dlcp/rese
>>>ar
>>> ch.google.com/en/us/pubs/archive/37200.pdf
>>> 
>>> Is there any plan to support that in MR 2.0?
>>> 
>>> Zheng
>> 
>
>

Reply via email to