Jeff, Doesn't the reducer see all of the data points for each cluster (canopy) in a single list?
If so, why the need to output during close? If not, why not? On 2/11/08 12:24 PM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote: > Hi Owen, > > Thanks for the information. I took Ted's advice and refactored my mapper > so as to use a combiner and that solved my front-end canopy generation > problem, but I still have to output the final canopies in the reducer > during close() since there is no similar combiner mechanism. I was > worried about this, but now I won't. > > Thanks, > Jeff > > -----Original Message----- > From: Owen O'Malley [mailto:[EMAIL PROTECTED] > Sent: Monday, February 11, 2008 10:40 AM > To: [email protected] > Subject: Re: Best Practice? > > > On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote: > >> I'm trying to wait until close() to output the cluster centroids to >> the >> reducer, but the OutputCollector is not available. > > You hit on exactly the right solution. Actually, because of Pipes and > Streaming, you have a lot more guarantees than you would expect. In > particular, you can call output.collect when the framework is between > calls to map or reduce up until the close finishes. > > -- Owen >
