Author: tdunning
Date: Wed Sep 8 18:49:59 2010
New Revision: 995189
URL: http://svn.apache.org/viewvc?rev=995189&view=rev
Log:
Use logLikelihood for fitness in non-binary case
Modified:
mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
Modified:
mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
URL:
http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java?rev=995189&r1=995188&r2=995189&view=diff
==============================================================================
---
mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
(original)
+++
mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
Wed Sep 8 18:49:59 2010
@@ -44,11 +44,18 @@ import java.util.concurrent.ExecutionExc
* of performance on the fly even if we make many passes through the data.
This does, however,
* increase the cost of training since if we are using 5-fold
cross-validation, each vector is used
* 4 times for training and once for classification. If this becomes a
problem, then we should
- * probably use a 2-way unbalanced train/test split rather than full cross
validation.
- *
+ * probably use a 2-way unbalanced train/test split rather than full cross
validation. With the
+ * current default settings, we have 100 learners running. This is better
than the alternative
+ * of running hundreds of training passes to find good hyper-parameters
because we only have to
+ * parse and feature-ize our inputs once. If you already have good
hyper-parameters, then you
+ * might prefer to just run one CrossFoldLearner with those settings.
+ * <p/>
* The fitness used here is AUC. Another alternative would be to try
log-likelihood, but it is
* much easier to get bogus values of log-likelihood than with AUC and the
results seem to
- * accord pretty well. It would be nice to allow the fitness function to be
pluggable.
+ * accord pretty well. It would be nice to allow the fitness function to be
pluggable. This
+ * use of AUC means that AdaptiveLogisticRegression is mostly suited for
binary target variables.
+ * This will be fixed before long by extending OnlineAuc to handle non-binary
cases or by using
+ * a different fitness value in non-binary cases.
*/
public class AdaptiveLogisticRegression implements OnlineLearner {
private int record = 0;
@@ -100,7 +107,11 @@ public class AdaptiveLogisticRegression
x.train(example);
}
if (x.getLearner().validModel()) {
- return x.wrapped.auc();
+ if (x.getLearner().numCategories() == 2) {
+ return x.wrapped.auc();
+ } else {
+ return x.wrapped.logLikelihood();
+ }
} else {
return Double.NaN;
}