Re: Classifying general Attribute-Relation data using Mahout

Robin Anil Tue, 09 Feb 2010 10:44:03 -0800

I have the data. I will upload shortly


On Wed, Feb 10, 2010 at 12:10 AM, Ted Dunning <[email protected]> wrote:

> Martin,
>
> I saw only one attachment here.  The other may have been stripped by the
> mailing list which prefers not to have attachments.
>
> I have filed an issue for this at
> https://issues.apache.org/jira/browse/MAHOUT-286
>
> Can you attach your data files there so that we can work on getting a
> better
> resolution for you?
>
> On Mon, Feb 8, 2010 at 5:35 AM, Martin Häger <[email protected]
> >wrote:
>
> > Hi Robin,
> >
> > The attached data.arff contains the test data, data.training.arff
> > contains the training data. We're running the svn trunk (r906954) of
> > Mahout. The attached script run.sh shows how we run it.
> > Should it be possible to run Mahout's NaiveBayes classifier on this
> > data in this way or is it limited to text documents only?
> >
> > Side note: We're expecting Weka to report 100% incorrect
> > classification since all test data belongs to the class "unknown",
> > whereas the training data is either "valid" or "invalid" (in fact, the
> > test data is the entire "invalid" set, so Weka manages to classify
> > everything correctly). We're not yet sure what class to put on the
> > test data, as we of course can't know anything about it (hence the
> > "unknown").
> >
> > 2010/2/8 Robin Anil <[email protected]>:
> > > Can you send the train and test data to me. Are you using 0.2 release
> or
> > the
> > > trunk?
> > >
> > > Seems model wasnt built as there was an error Exception in thread
> "main"
> > > org.apache.hadoop.mapred.InvalidInputException: Input path does not
> > exist:
> > > file:/tmp/hadoop/model/trainer-termDocCount
> > > Input path does not exist: file:/tmp/hadoop/model/trainer-wordFreq
> > > Input path does not exist: file:/tmp/hadoop/model/trainer-featureCount
> > >
> > > So there is no point running the classifier
> > >
> > > Weka also seems not to be doing good either.
> > >
> > >
> > >
> > > On Mon, Feb 8, 2010 at 6:24 PM, Martin Häger <[email protected]
> > >wrote:
> > >
> > >> Hi,
> > >>
> > >> We're experimenting a bit with Weka and Mahout. Our input data is a
> > >> relation in ARFF format (see attached data.training.arff), and we'd
> > >> like to classify it using Mahout. However, it seems (to us, at first)
> > >> that the Mahout classifier.bayes.interfaces.Algorithm interface is
> > >> centered around documents of text, and not general attribute data.
> > >> Thus, running the classifier causes our ARFF data to be interpreted as
> > >> a document of words, with not very useful results (see attached
> > >> mahout.log).
> > >>
> > >> With Weka, we're able to get the results we want (see attached
> > weka.log).
> > >>
> > >> Any suggestions for how to get this working?
> > >>
> > >> Thanks!
> > >>
> > >
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Classifying general Attribute-Relation data using Mahout

Reply via email to