You're right again. Once the reducer has clustered all its input canopy
centroids it is done and can collect the resulting canopies to output. I
guess I was just wedged in that close() pattern.

Thanks,
Jeff

-----Original Message-----
From: Ted Dunning [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 11, 2008 12:40 PM
To: [email protected]
Subject: Re: Best Practice?


Jeff,

Doesn't the reducer see all of the data points for each cluster (canopy)
in
a single list?

If so, why the need to output during close?

If not, why not?


On 2/11/08 12:24 PM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote:

> Hi Owen,
> 
> Thanks for the information. I took Ted's advice and refactored my
mapper
> so as to use a combiner and that solved my front-end canopy generation
> problem, but I still have to output the final canopies in the reducer
> during close() since there is no similar combiner mechanism. I was
> worried about this, but now I won't.
> 
> Thanks,
> Jeff
> 
> -----Original Message-----
> From: Owen O'Malley [mailto:[EMAIL PROTECTED]
> Sent: Monday, February 11, 2008 10:40 AM
> To: [email protected]
> Subject: Re: Best Practice?
> 
> 
> On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote:
> 
>> I'm trying to wait until close() to output the cluster centroids to
>> the
>> reducer, but the OutputCollector is not available.
> 
> You hit on exactly the right solution. Actually, because of Pipes and
> Streaming, you have a lot more guarantees than you would expect. In
> particular, you can call output.collect when the framework is between
> calls to map or reduce up until the close finishes.
> 
> -- Owen
> 

Reply via email to