[ 
https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated MAHOUT-228:
-------------------------------

    Attachment: r.csv
                logP.csv
                sgd.csv


I have been doing some testing on the training algorithm and there seems to be 
a glitch in it.  The problem is that the prior gradient is strong enough that 
for lambda > really small, the regularization zeros out all of the coefficients 
on every iteration.  Not good.

I will attach some sample data that I have been using for these experiments.  
These reference for these experiments was an optimization I did in R where I 
explicitly optimized a simple example and got very plausible results.

For the R example, I used the following definition of the function to optimize:

{noformat}
f <- function(beta) {
    p = w(rowSums(x %*% matrix(beta, ncol=1)));
    r1 = -sum(y*log(p+(p==0))+(1-y)*log(1-p+(p==1))); 
    r2=lambda*sum(abs(beta)); 
    (r1+r2)
}

w <- function(x) {
    return(1/(1+exp(-x)))
}
{noformat}
Here beta is the coefficient vector, lambda sets the amount of regularization, 
x are the input vectors one observation per row, y are the known categories for 
the rows of x, f is the combined log likelihood (r1) and log prior (r2), and w 
is the logistic function.  I used an unsimplified form for the overall logistic 
likelihood for simplicity.  Normally, a simpler form is used of -sum(y - p), 
but I wanted to keep things straightforward.

The attached file sgd.csv contains the value of x.  The value of y is simply 30 
0's followed by 30 1's.  

Optimization was done using this:
{noformat}
lambda <- 0.1
beta.01 <- optim(beta,f, method="CG", control=list(maxit=10000))
lambda <- 1
beta.1 <- optim(beta,f, method="CG", control=list(maxit=10000))
lambda <- 10
beta.10 <- optim(beta,f, method="CG", control=list(maxit=10000))
{noformat}
The values for beta obtained are contained in the file r.csv and the log-MAP 
likelihoods are in logP.csv

I will shortly add a patch that has my initial test in it.  This patch will 
contain these test data files.  I will be working on this problem off and on 
over the next few days, but any hints that anybody has are welcome.  My 
expectation is that there is a silly oversight in my Java code.




> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-228
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-228
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Ted Dunning
>             Fix For: 0.3
>
>         Attachments: logP.csv, MAHOUT-228-1.patch, MAHOUT-228-2.patch, r.csv, 
> sgd.csv
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable 
> learning (see Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a 
> reasonable place to start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to