Jeff,

Doesn't the reducer see all of the data points for each cluster (canopy) in
a single list?

If so, why the need to output during close?

If not, why not?


On 2/11/08 12:24 PM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote:

> Hi Owen,
> 
> Thanks for the information. I took Ted's advice and refactored my mapper
> so as to use a combiner and that solved my front-end canopy generation
> problem, but I still have to output the final canopies in the reducer
> during close() since there is no similar combiner mechanism. I was
> worried about this, but now I won't.
> 
> Thanks,
> Jeff
> 
> -----Original Message-----
> From: Owen O'Malley [mailto:[EMAIL PROTECTED]
> Sent: Monday, February 11, 2008 10:40 AM
> To: [email protected]
> Subject: Re: Best Practice?
> 
> 
> On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote:
> 
>> I'm trying to wait until close() to output the cluster centroids to
>> the
>> reducer, but the OutputCollector is not available.
> 
> You hit on exactly the right solution. Actually, because of Pipes and
> Streaming, you have a lot more guarantees than you would expect. In
> particular, you can call output.collect when the framework is between
> calls to map or reduce up until the close finishes.
> 
> -- Owen
> 

Reply via email to