Oops. Sorry about that! The issue seems to be that ReviewBoard automatically CCs the dev list but it's apparently not subscribed.
On Fri, Mar 29, 2013 at 6:36 PM, Otis Gospodnetic < [email protected]> wrote: > FYI, I'm getting a lot of these (and not moderating any more due to lack of > time) > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > ---------- Forwarded message ---------- > From: <dev-reject-1364573050.63309.haimnphidmmapikej...@mahout.apache.org> > Date: Fri, Mar 29, 2013 at 12:04 PM > Subject: MODERATE for [email protected] > To: > Cc: dev-allow-tc.1364573050.abpdchciinoejcdfjbch-noreply= > [email protected] > > > > To approve: > dev-accept-1364573050.63309.haimnphidmmapikej...@mahout.apache.org > To reject: > dev-reject-1364573050.63309.haimnphidmmapikej...@mahout.apache.org > To give a reason to reject: > %%% Start comment > %%% End comment > > > > ---------- Forwarded message ---------- > From: "Dan Filimon" <[email protected]> > To: "Sebastian Schelter" <[email protected]>, "Ted Dunning" < > [email protected]> > Cc: "Dan Filimon" <[email protected]>, "mahout" < > [email protected]> > Date: Fri, 29 Mar 2013 16:04:08 -0000 > Subject: Re: Review Request: MAHOUT-1181: Adds StreamingKMeans MapReduce > classes > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10193/ > > On March 29th, 2013, 1:48 p.m., *Sebastian Schelter* wrote: > > > > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansDriver.java< > https://reviews.apache.org/r/10193/diff/1/?file=276345#file276345line203> > (Diff > revision 1) > > None > > {'text': ' private void configureOptionsForWorkers() throws > ClassNotFoundException, IllegalAccessException,', 'line': 175} > > 203 > > log.info("No measure class given, using EuclideanDistanceMeasure"); > > Why not make euclidean distance the default value of the distance > measure option? > > I forgot to do that myself because the option is in > DefaultOptionCreator. Fortunately, the default set there, > SquaredEuclideanDistance is a great default, probably better than > EuclideanDistance. So, I just removed this chunk of code entirely. > > > On March 29th, 2013, 1:48 p.m., *Sebastian Schelter* wrote: > > > > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansDriver.java< > https://reviews.apache.org/r/10193/diff/1/?file=276345#file276345line309> > (Diff > revision 1) > > None > > {'text': ' private void configureOptionsForWorkers() throws > ClassNotFoundException, IllegalAccessException,', 'line': 175} > > 309 > > log.error("Measure class not found " + measureClass, e); > > program should throw an exception and terminate if the distance > measure class cannot be found, right? > > Indeed. I removed the try/catch. > > > On March 29th, 2013, 1:48 p.m., *Sebastian Schelter* wrote: > > > > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansDriver.java< > https://reviews.apache.org/r/10193/diff/1/?file=276345#file276345line315> > (Diff > revision 1) > > None > > {'text': ' private void configureOptionsForWorkers() throws > ClassNotFoundException, IllegalAccessException,', 'line': 175} > > 315 > > log.error("Searcher class not found " + measureClass, e); > > program should throw an exception and terminate if the searcher > class cannot be found, right? > > Yep, same as above. > > > - Dan > > On March 29th, 2013, 4:03 p.m., Dan Filimon wrote: > Review request for mahout, Ted Dunning and Sebastian Schelter. > By Dan Filimon. > > *Updated March 29, 2013, 4:03 p.m.* > Description > > This depends (loosely) on https://reviews.apache.org/r/10194/ > > This patch implements the MapReduce version of StreamingKMeans for > MAHOUT-1154. > > It adds 5 new classes: > - CentroidWritable: class representing a centroid that can be written > to a SeqFile > - StreamingKMeansDriver: class implementing AbstractJob that is the > entry point to the mapreduction > - StreamingKMeansMapper: mapper, running StreamingKMeans (see > MAHOUT-1162) clustering the points one by one > - StreamingKMeansReducer: reducer, running BallKMeans (see > MAHOUT-1162) a number of times and picking the clustering with the > lowest total clustering cost. > The cost is determined by randomly splitting the incoming centroids > into a "training" and "test" set, computing the centroids on the > training set and the cost on the test set. The intent is to see > whether the centroids actually describe the distribution of the points > or not. > - StreamingKMeansUtilMR: helper class with a method to instantiate a > searcher from a Configuration. > > Additionally, there is a test class StreamingKMeansTestMR that tests > the mapper, reducer and mapper and reducer together using MRUnit. > > !!! > Since MRUnit is now a dependency, the core pom.xml file adds MRUnit as > a dependency. We depend on snapshot 1.0 which is not yet released (it > will be very soon), hence the updated pom.xml is not provided for now. > !!! > > Testing > > See StreamingKMeansTestMR for the tests. These are all performed on > data sample from a "hypercube" distribution (there are multinormal > distributions in each vertex of the cube). > Additionally there are ongoing tests on the 20 newsgroups data set > (and some more are on the way). > > Diffs > > - > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/CentroidWritable.java > (PRE-CREATION) > - > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansDriver.java > (PRE-CREATION) > - > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansMapper.java > (PRE-CREATION) > - > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansReducer.java > (PRE-CREATION) > - > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansUtilsMR.java > (PRE-CREATION) > - > core/src/test/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansTestMR.java > (PRE-CREATION) > - src/conf/driver.classes.default.props (ac45eef) > > View Diff <https://reviews.apache.org/r/10193/diff/> >
