Richard Scharrer created MAHOUT-1525:
----------------------------------------

             Summary: train/validateAdaptiveLogistic
                 Key: MAHOUT-1525
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1525
             Project: Mahout
          Issue Type: Question
          Components: Classification
    Affects Versions: 0.7
            Reporter: Richard Scharrer


Hi,
I tried to use train- and validateAdaptiveLogistic on my data which is like:
category, id, var1, var2, ...var72 (all numeric)

I used the following settings:
mahout trainAdaptiveLogistic --input resource/trainingData \
--output ./model \
--target category --categories 9 \
--predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .....
--types numeric \
--passes 100 \
--showperf \

mahout validateAdaptiveLogistic --input resource/testData --model model 
--confusion --defaultCategory none

The output of validateAdaptiveLogistic is:
Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       d       e       f       g       h       i       <--Classified as
14      0       0       0       0       0       0       0        |  14          
a     = projekt
0       18      0       0       0       0       0       0        |  18          
b     = news/aktuelles/presse
0       0       24      0       0       0       0       0        |  24          
d     = lehrveranstaltung
0       0       0       19      0       0       0       0        |  19          
e     = publikation
0       0       0       0       20      0       0       0        |  20          
f     = event
0       0       0       0       0       14      0       0        |  14          
g     = mitarbeiter/person
0       0       0       0       0       0       44      0        |  44          
h     = übersicht
0       0       0       0       0       0       0       13       |  13          
i     = institut


(in case you were wondering, the categories a in german)

My problem is that this is impossible. I always get a perfect classification 
even with just a little amount of training data. It doesnt even matter how many 
features I use I tried it with all 72 and with only one. Am I missing something?

Regards,
Richard



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to