--- On Thu, 11/20/08, Amogh Vasekar <[EMAIL PROTECTED]> wrote:

> From: Amogh Vasekar <[EMAIL PROTECTED]>
> Subject: combiner without reducer
> To: [EMAIL PROTECTED], [email protected]
> Date: Thursday, November 20, 2008, 9:48 PM
> Hi,
> I believe currently a combiner is not run unless you have
> atleast one
> reducer set. 
> Not getting into the Hadoop-18 semantics of combiner
> running on both
> sides ( the number of reducers are anyways 0, so I guess
> the
> merge-combine doesn't come into picture at all) , I
> have a use case
> where I would like to run a combiner without a reducer.
> Basically the aggregation ( a lookup sort of thing ) I do
> is dependent
> on a relatively small dataset, and the aggregation is
> independent of
> records in the map input data forming the input dataset,
> and hence the
> motivation for combine-without-reduce. 
> What I wanted to do was aggregate the similar records in
> the combiner (
> or particular instance of combiner ) in a single shot, this
> forming my
> output. This would save me from the amount of intermediate
> I/O involved
> in S&S phase at some partial I/O cost on the map +
> combine side, and I
> just wanted to try it out to see if its feasible at all. 
> Given combiner w/o reducer is not supported, I was thinking
> of doing it
> in a similar way Hadoop would do : create a buffer, sort,
> combine as I
> flush.
> Any thoughts on this would be really helpful.
> 
> Thanks,
> Amogh


      

Reply via email to