combiner without reducer

Amogh Vasekar Thu, 20 Nov 2008 21:52:00 -0800

Hi,
I believe currently a combiner is not run unless you have atleast one
reducer set. 
Not getting into the Hadoop-18 semantics of combiner running on both
sides ( the number of reducers are anyways 0, so I guess the
merge-combine doesn't come into picture at all) , I have a use case
where I would like to run a combiner without a reducer.
Basically the aggregation ( a lookup sort of thing ) I do is dependent
on a relatively small dataset, and the aggregation is independent of
records in the map input data forming the input dataset, and hence the
motivation for combine-without-reduce. 
What I wanted to do was aggregate the similar records in the combiner (
or particular instance of combiner ) in a single shot, this forming my
output. This would save me from the amount of intermediate I/O involved
in S&S phase at some partial I/O cost on the map + combine side, and I
just wanted to try it out to see if its feasible at all. 
Given combiner w/o reducer is not supported, I was thinking of doing it
in a similar way Hadoop would do : create a buffer, sort, combine as I
flush.
Any thoughts on this would be really helpful.


Thanks,
Amogh

combiner without reducer

Reply via email to