Github user mktal commented on the issue:

    https://github.com/apache/incubator-madlib/pull/75
  
    This is a good point Aaron. In terms of convergence behavior, it has both 
benefit of mini-batch that iterates fast and large batch size that reduce the 
variance of the empirical objective. To see this, note that we do multiple 
epoch within each buffer and given enough epoch we solve that buffer 
accurately, which can in turn be seen as applying multiple update towards the 
minimizer of objective formulated from that buffer. That is the reason why we 
only need one evoke of UDA and it gives us pretty good solution already. I 
encourage you to test yourself. I will also run more experiments in the future.
    
    Regarding the gradient, I think in this case it is not a very good 
convergence indicator since hinge loss is not a smooth function, e.g., such 
that when the loss does not change much the gradient is still far from zero.  
And one might have overfitting issue if one really force the gradient to be 
very small. So in this case loss value might be a better indicator of good 
solution (not necessarily the optimal solution but one with small 
generalization error). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to