> On 2012-03-08 23:42:23, Dmitriy Lyubimov wrote: > > trunk/core/src/main/java/org/apache/mahout/common/HadoopUtil.java, line 242 > > <https://reviews.apache.org/r/4237/diff/1/?file=88913#file88913line242> > > > > Extra white spaces generate style warnings but are notoriously hard to > > eliminate. I have no problems with them but it was a subject of review > > notes from other committers before, I am just reproducing collective mind > > here.
Good point - I'll address this before committing. > On 2012-03-08 23:42:23, Dmitriy Lyubimov wrote: > > trunk/core/src/main/java/org/apache/mahout/common/HadoopUtil.java, line 247 > > <https://reviews.apache.org/r/4237/diff/1/?file=88913#file88913line247> > > > > is this really an issue with Hadoop 0.20.203? i don't think we care > > about classic hadoop 0.20.2 anymore. Hmm, I am not sure, but I think this is a comment we can omit. Without the new listStatus methods (but with the changes to other files), there is at least one test that won't compile. I'm willing to figure out who added this to the patch and ping them about it, but before I do, what should I ask them to clarify? Exactly which version of Hadoop they had in mind when making the changes to HadoopUtil? > On 2012-03-08 23:42:23, Dmitriy Lyubimov wrote: > > trunk/core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java, > > line 356 > > <https://reviews.apache.org/r/4237/diff/1/?file=88920#file88920line356> > > > > This and some other changes look suspiciously like a functional change > > rather than a hadoop compatibility change. I think it would help somebody > > else more familiar with this code to assure it is indeed benign.. The changes in tests were needed because some tests were requiring outputs to appear in a certain order. That is to say, we'd need to output "cluster 1 contains items a, b and d; cluster 2 contains c, e and f"; finding the same set of clusters but calling {c,e,f} "cluster 1" would not do. Under Hadoop 0.23.1, sometimes these outputs come in a different order. Most of the changes to tests simply allow permutations in the expected output. Mean shift was it's own special case - it is very sensitive to these ordering effects, and it is an iterative process. I didn't dig into the implementation enough to convince myself it was possible to devise a new test that would be immune to ordering (though it's likely this is true); I just found an input permutation that gave a consistent answer between 0.20.203 and 0.23.1. In the other cases I think I made the tests a little better, but in this case I made the test pass. Is there someone you'd nominate as an additional reviewer? - tom ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4237/#review5754 ----------------------------------------------------------- On 2012-03-08 01:24:21, tom pierce wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4237/ > ----------------------------------------------------------- > > (Updated 2012-03-08 01:24:21) > > > Review request for mahout and Dmitriy Lyubimov. > > > Summary > ------- > > This is the current patch for MAHOUT-822 (as posted by Bilung Lee). > > > This addresses bug MAHOUT-822. > https://issues.apache.org/jira/browse/MAHOUT-822 > > > Diffs > ----- > > trunk/core/pom.xml 1296318 > trunk/core/src/main/java/org/apache/mahout/common/HadoopUtil.java 1296318 > > trunk/core/src/test/java/org/apache/mahout/classifier/df/mapreduce/partial/MockContext.java > 1296318 > > trunk/core/src/test/java/org/apache/mahout/classifier/df/mapreduce/partial/PartialSequentialBuilder.java > 1296318 > > trunk/core/src/test/java/org/apache/mahout/classifier/df/mapreduce/partial/Step1MapperTest.java > 1296318 > > trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java > 1296318 > > trunk/core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java > 1296318 > > trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java > 1296318 > > trunk/core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java > 1296318 > trunk/core/src/test/java/org/apache/mahout/common/DummyCounter.java 1296318 > trunk/core/src/test/java/org/apache/mahout/common/DummyRecordWriter.java > 1296318 > trunk/core/src/test/java/org/apache/mahout/common/DummyStatusReporter.java > 1296318 > > trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java > 1296318 > > trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java > 1296318 > trunk/pom.xml 1296318 > > Diff: https://reviews.apache.org/r/4237/diff > > > Testing > ------- > > Passes unit tests under default config as well as under hadoop 0.23.1. > > > Thanks, > > tom > >