[ 
https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802165#action_12802165
 ] 

Olivier Grisel commented on MAHOUT-228:
---------------------------------------

bq. Are you sure that this is correct? The lazy regularization update should be 
applied before any coefficient is used for prediction or for update. Is eager 
regularization after the update necessary?

I made it is eager only for the coefficients that have just be updated by the 
current train step, the remaining coefficient regularization is still delayed 
until the next "classify()" affecting those coefficients.

If we do not do this (or find a some how equivalent work around) the 
coefficient are only regularized upon the classify call and hence are marked as 
regularized for the current step value while at the same time the training 
update make the coefficient of the current step non-null hence inducing a 
completely dense parameters set.

While this is not a big deal as long as beta is using a DenseMatrix 
representation, this prevent us to actually measure the real impact of the 
lambda value by measuring the sparsity of the parameters. Maybe on problem 
leading to very sparse models, using a SparseRowMatrix of some kind will be 
determinant performance-wise and in that case the sparsity inducing ability of 
L1 should be ensured.

Maybe lazy regularization could also be implemented in a more simple / readable 
way by doing full regularizeration of beta every "regularizationSkip" training 
steps (IIRC, this is the case in Leon Bottou's SvmSgd2 but this adds yet 
another hyperparameter to fiddle with).

There might also be a way to mostly keep the lazy reg as it is and rethink the 
updateSteps update to avoid breaking the sparsity of L1. Maybe this is just a 
matter of moving the step++; call after the classify(instance); call. I don't 
remember if it tried that...

> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-228
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-228
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Ted Dunning
>             Fix For: 0.3
>
>         Attachments: logP.csv, MAHOUT-228-3.patch, r.csv, sgd-derivation.pdf, 
> sgd-derivation.tex, sgd.csv
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable 
> learning (see Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a 
> reasonable place to start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to