Richard Scharrer created MAHOUT-1525:
----------------------------------------
Summary: train/validateAdaptiveLogistic
Key: MAHOUT-1525
URL: https://issues.apache.org/jira/browse/MAHOUT-1525
Project: Mahout
Issue Type: Question
Components: Classification
Affects Versions: 0.7
Reporter: Richard Scharrer
Hi,
I tried to use train- and validateAdaptiveLogistic on my data which is like:
category, id, var1, var2, ...var72 (all numeric)
I used the following settings:
mahout trainAdaptiveLogistic --input resource/trainingData \
--output ./model \
--target category --categories 9 \
--predictors a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 .....
--types numeric \
--passes 100 \
--showperf \
mahout validateAdaptiveLogistic --input resource/testData --model model
--confusion --defaultCategory none
The output of validateAdaptiveLogistic is:
Log-likelihood:Min=-5.54, Max=-0.04, Mean=-1.58, Median=-1.33
=======================================================
Confusion Matrix
-------------------------------------------------------
a b d e f g h i <--Classified as
14 0 0 0 0 0 0 0 | 14
a = projekt
0 18 0 0 0 0 0 0 | 18
b = news/aktuelles/presse
0 0 24 0 0 0 0 0 | 24
d = lehrveranstaltung
0 0 0 19 0 0 0 0 | 19
e = publikation
0 0 0 0 20 0 0 0 | 20
f = event
0 0 0 0 0 14 0 0 | 14
g = mitarbeiter/person
0 0 0 0 0 0 44 0 | 44
h = übersicht
0 0 0 0 0 0 0 13 | 13
i = institut
(in case you were wondering, the categories a in german)
My problem is that this is impossible. I always get a perfect classification
even with just a little amount of training data. It doesnt even matter how many
features I use I tried it with all 72 and with only one. Am I missing something?
Regards,
Richard
--
This message was sent by Atlassian JIRA
(v6.2#6252)