[ 
https://issues.apache.org/jira/browse/OPENNLP-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977796#comment-13977796
 ] 

Vinh Khuc commented on OPENNLP-671:
-----------------------------------

Attached is the patch for L1-LBFGS. The implementation of ElasticNet (i.e. L1 
and L2 combined) is also included. During L-BFGS training:
if L1Cost > 0, L2Cost = 0, L1-regularization will be used,
if L1Cost = 0, L2Cost > 0, L2-regularization will be used,
and ElasticNet will be used if both costs are set to be > 0.

As shown in the attached log files, L1-regularization gives very good accuracy 
for the NL-PER data set. Moreover, the trained model is much smaller than the 
one trained with L2-regularization. L1 works well for NL-PER since the number 
of features/contexts is much larger than the number training instances.

However, when the number of training instances is larger than the number of 
features, L2 tends to work better than L1. ElasticNet is added to solve this 
problem by combining the advantages of L1 and L2.

I also moved LBFGS-based convex optimization solver into the QNMinimizer class 
so that it can be used for other purposes. Usage example is described in its 
class description.

Finally, I did some code cleanup to make the source code easier to maintain.

> Add L1-regularization into L-BFGS
> ---------------------------------
>
>                 Key: OPENNLP-671
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-671
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Machine Learning
>            Reporter: Vinh Khuc
>         Attachments: L1-ElasticNet-LBFGS.patch, nl-per-testa-l1.log, 
> nl-per-testb-l1.log, nl-per-train-l1.log, qn-trainer-l1.params
>
>
> L1-regularization is useful during training Maximum Entropy models since it 
> pushes parameters of irrelevant features to zero. Hence, the parameter vector 
> will be sparse and the trained model will be compact. 
> When the number of features is much larger than the number of training 
> examples, L1 often gives better accuracy than L2.
> The implementation of L1-regularization for L-BFGS will follow the method 
> described in the paper:
> http://research.microsoft.com/en-us/um/people/jfgao/paper/icml07scalable.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to