[ https://issues.apache.org/jira/browse/MAHOUT-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036009#comment-13036009 ]
Ted Dunning commented on MAHOUT-703: ------------------------------------ It comes almost for free with SGD neural net codes to put L1 and L2 penalties in as well. I would recommend it. The trick is that you can't depend on the gradient being sparse so you can't use the lazy regularization. Leon Botou describes a stochastic full regularization with an adjusted learning rate which should perform comparably. He mostly talks about weight decay (which is L_2 regularization) which can be handled cleverly by keeping a multiplier and a vector. I think L_1 is important, but it requires something like truncated constant decay which can't be done with a multiplier. See http://leon.bottou.org/projects/sgd > Implement Gradient machine > -------------------------- > > Key: MAHOUT-703 > URL: https://issues.apache.org/jira/browse/MAHOUT-703 > Project: Mahout > Issue Type: New Feature > Components: Classification > Affects Versions: 0.6 > Reporter: Hector Yee > Priority: Minor > Labels: features > Original Estimate: 72h > Remaining Estimate: 72h > > Implement a gradient machine (aka 'neural network) that can be used for > classification or auto-encoding. > It will just have an input layer, identity, sigmoid or tanh hidden layer and > an output layer. > Training done by stochastic gradient descent (possibly mini-batch later). > Sparsity will be optionally enforced by tweaking the bias in the hidden unit. > For now it will go in classifier/sgd and the auto-encoder will wrap it in the > filter unit later on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira