Hi,

what do I have to do, to ensure that sorting after map and before combine is 
stable?

I want to build a reverse index and have a mapper and combiner like:

def map(id, text):
        for term in tokenize(text):
                yield term, id

def combine(term, ids):
        yield term, compress(ids)

The compress function needs the ids to be sorted. Since the ids are sorted in 
the input to map, the ids would also be sorted when handed to combine, 
provided that the sorting between map and combine is stable.

But in my current experiments, combine gets the ids without any sorting.
(I read a hbase table and do the map and reduce with dumbo. Hadoop 0.20, hbase 
0.20 and dumbo 0.21)

I am a newbe to hadoop and would be thankful, if anybody could point me in the 
right direction.

Bye,

Mat

Reply via email to