[ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Dunning updated MAHOUT-228: ------------------------------- Attachment: r.csv logP.csv sgd.csv I have been doing some testing on the training algorithm and there seems to be a glitch in it. The problem is that the prior gradient is strong enough that for lambda > really small, the regularization zeros out all of the coefficients on every iteration. Not good. I will attach some sample data that I have been using for these experiments. These reference for these experiments was an optimization I did in R where I explicitly optimized a simple example and got very plausible results. For the R example, I used the following definition of the function to optimize: {noformat} f <- function(beta) { p = w(rowSums(x %*% matrix(beta, ncol=1))); r1 = -sum(y*log(p+(p==0))+(1-y)*log(1-p+(p==1))); r2=lambda*sum(abs(beta)); (r1+r2) } w <- function(x) { return(1/(1+exp(-x))) } {noformat} Here beta is the coefficient vector, lambda sets the amount of regularization, x are the input vectors one observation per row, y are the known categories for the rows of x, f is the combined log likelihood (r1) and log prior (r2), and w is the logistic function. I used an unsimplified form for the overall logistic likelihood for simplicity. Normally, a simpler form is used of -sum(y - p), but I wanted to keep things straightforward. The attached file sgd.csv contains the value of x. The value of y is simply 30 0's followed by 30 1's. Optimization was done using this: {noformat} lambda <- 0.1 beta.01 <- optim(beta,f, method="CG", control=list(maxit=10000)) lambda <- 1 beta.1 <- optim(beta,f, method="CG", control=list(maxit=10000)) lambda <- 10 beta.10 <- optim(beta,f, method="CG", control=list(maxit=10000)) {noformat} The values for beta obtained are contained in the file r.csv and the log-MAP likelihoods are in logP.csv I will shortly add a patch that has my initial test in it. This patch will contain these test data files. I will be working on this problem off and on over the next few days, but any hints that anybody has are welcome. My expectation is that there is a silly oversight in my Java code. > Need sequential logistic regression implementation using SGD techniques > ----------------------------------------------------------------------- > > Key: MAHOUT-228 > URL: https://issues.apache.org/jira/browse/MAHOUT-228 > Project: Mahout > Issue Type: New Feature > Components: Classification > Reporter: Ted Dunning > Fix For: 0.3 > > Attachments: logP.csv, MAHOUT-228-1.patch, MAHOUT-228-2.patch, r.csv, > sgd.csv > > > Stochastic gradient descent (SGD) is often fast enough for highly scalable > learning (see Vowpal Wabbit, http://hunch.net/~vw/). > I often need to have a logistic regression in Java as well, so that is a > reasonable place to start. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.