I have the data. I will upload shortly
On Wed, Feb 10, 2010 at 12:10 AM, Ted Dunning <[email protected]> wrote: > Martin, > > I saw only one attachment here. The other may have been stripped by the > mailing list which prefers not to have attachments. > > I have filed an issue for this at > https://issues.apache.org/jira/browse/MAHOUT-286 > > Can you attach your data files there so that we can work on getting a > better > resolution for you? > > On Mon, Feb 8, 2010 at 5:35 AM, Martin Häger <[email protected] > >wrote: > > > Hi Robin, > > > > The attached data.arff contains the test data, data.training.arff > > contains the training data. We're running the svn trunk (r906954) of > > Mahout. The attached script run.sh shows how we run it. > > Should it be possible to run Mahout's NaiveBayes classifier on this > > data in this way or is it limited to text documents only? > > > > Side note: We're expecting Weka to report 100% incorrect > > classification since all test data belongs to the class "unknown", > > whereas the training data is either "valid" or "invalid" (in fact, the > > test data is the entire "invalid" set, so Weka manages to classify > > everything correctly). We're not yet sure what class to put on the > > test data, as we of course can't know anything about it (hence the > > "unknown"). > > > > 2010/2/8 Robin Anil <[email protected]>: > > > Can you send the train and test data to me. Are you using 0.2 release > or > > the > > > trunk? > > > > > > Seems model wasnt built as there was an error Exception in thread > "main" > > > org.apache.hadoop.mapred.InvalidInputException: Input path does not > > exist: > > > file:/tmp/hadoop/model/trainer-termDocCount > > > Input path does not exist: file:/tmp/hadoop/model/trainer-wordFreq > > > Input path does not exist: file:/tmp/hadoop/model/trainer-featureCount > > > > > > So there is no point running the classifier > > > > > > Weka also seems not to be doing good either. > > > > > > > > > > > > On Mon, Feb 8, 2010 at 6:24 PM, Martin Häger <[email protected] > > >wrote: > > > > > >> Hi, > > >> > > >> We're experimenting a bit with Weka and Mahout. Our input data is a > > >> relation in ARFF format (see attached data.training.arff), and we'd > > >> like to classify it using Mahout. However, it seems (to us, at first) > > >> that the Mahout classifier.bayes.interfaces.Algorithm interface is > > >> centered around documents of text, and not general attribute data. > > >> Thus, running the classifier causes our ARFF data to be interpreted as > > >> a document of words, with not very useful results (see attached > > >> mahout.log). > > >> > > >> With Weka, we're able to get the results we want (see attached > > weka.log). > > >> > > >> Any suggestions for how to get this working? > > >> > > >> Thanks! > > >> > > > > > > > > > -- > Ted Dunning, CTO > DeepDyve >
