Thanks deneche, I will test it soon.
On Sat, Apr 3, 2010 at 12:21 PM, deneche abdelhakim <a_dene...@yahoo.fr>wrote: > Hi, > > Just committed a new version of TestForest. If you add "-mr" to the command > line it should launch a Hadoop Job to classify the data. This is a basic > implementation that can't compute the confusion matrix, so using "-a" has no > effect. This implementation is also not tested very well (being a work in > progress), so if you want to test it, select a random subset of your test > data and classify them using the sequential implementation (without using > -mr) then compare the predictions with those of the distributed > implementation, the results won't be exactly the same (due the random > behavior of the classifier when it encounter ties) but 90% of the > predictions should be the same. > > let me know what you think of it. I'm working on the confusion matrix, but > it should take some time to finish > > --- En date de : Ven 26.3.10, Yang Sun <soushare....@gmail.com> a écrit : > > > De: Yang Sun <soushare....@gmail.com> > > Objet: Question about mahout Describe > > À: mahout-user@lucene.apache.org > > Date: Vendredi 26 mars 2010, 22h16 > > I was testing mahout recently. It > > runs great on small testing datasets. > > However, when I try to expand the dataset to a big dataset > > directory, I got > > the following error message: > > > > [localhost]$ hjar > > examples/target/mahout-examples-0.4-SNAPSHOT.job > > org.apache.mahout.df.mapreduce.TestForest -i > > /user/fulltestdata/* -ds rf/ > > testdata.info -m rf-testmodel-5-100 -a -o > > rf/fulltestprediction > > > > Exception in thread "main" java.io.IOException: Cannot open > > filename > > /user/fulltestdata/* > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465) > > at > > org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) > > at > > org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351) > > at > > org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190) > > at > > org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137) > > at > > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > > org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at > > java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > My question is: can I use mahout on directories instead of > > single files? and > > how? > > > > Thanks, > > > > > >