[
https://issues.apache.org/jira/browse/MAHOUT-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982088#comment-13982088
]
Richard Scharrer edited comment on MAHOUT-1525 at 4/26/14 9:28 PM:
-------------------------------------------------------------------
Solved it. I don't know why it's programmed like this, but
validateAdaptiveLogistic gives you a confusion matrix which shows how it should
be if everything is classified correctly instead of the value given by the
model. It can easily be changed by changing:
cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target));
to:
Vector result = learner.classifyFull(v);
int cat = result.maxValueIndex();
cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat));
was (Author: pilgrim):
Solved it. I don't know why it's programmed like this, but
validateAdaptiveLogistic gives you a confusion matrix which shows how it should
be if everything is classified correctly instead of the value given by the
model. It can easily be changed by changing:
cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(target));
too:
Vector result = learner.classifyFull(v);
int cat = result.maxValueIndex();
cm.addInstance(csv.getTargetString(line), csv.getTargetLabel(cat));
> train/validateAdaptiveLogistic
> ------------------------------
>
> Key: MAHOUT-1525
> URL: https://issues.apache.org/jira/browse/MAHOUT-1525
> Project: Mahout
> Issue Type: Question
> Components: Classification
> Affects Versions: 0.9
> Reporter: Richard Scharrer
> Labels: adaptiveLogisticRegression,, newbie
>
> Hi,
> I tried to use train- and validateAdaptiveLogistic on my data which is like:
> category, id, var1, var2, ...var72 (all numeric)
> I used the following settings:
> mahout trainAdaptiveLogistic --input resource/trainingData \
> --output ./model \
> --target category --categories 9 \
> --predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .....
> --types numeric \
> --passes 100 \
> --showperf \
> mahout validateAdaptiveLogistic --input resource/testData --model model
> --confusion --defaultCategory none
> The output of validateAdaptiveLogistic is:
> Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b d e f g h i <--Classified as
> 14 0 0 0 0 0 0 0 | 14
> a = projekt
> 0 18 0 0 0 0 0 0 | 18
> b = news/aktuelles/presse
> 0 0 24 0 0 0 0 0 | 24
> d = lehrveranstaltung
> 0 0 0 19 0 0 0 0 | 19
> e = publikation
> 0 0 0 0 20 0 0 0 | 20
> f = event
> 0 0 0 0 0 14 0 0 | 14
> g = mitarbeiter/person
> 0 0 0 0 0 0 44 0 | 44
> h = übersicht
> 0 0 0 0 0 0 0 13 | 13
> i = institut
> (in case you were wondering, the categories a in german)
> My problem is that this is impossible. I always get a perfect classification
> even with just a little amount of training data. It doesnt even matter how
> many features I use I tried it with all 72 and with only one. Am I missing
> something?
> Regards,
> Richard
--
This message was sent by Atlassian JIRA
(v6.2#6252)