Github user mktal commented on the issue:
https://github.com/apache/incubator-madlib/pull/75
This is a good point Aaron. In terms of convergence behavior, it has both
benefit of mini-batch that iterates fast and large batch size that reduce the
variance of the empirical objective. To see this, note that we do multiple
epoch within each buffer and given enough epoch we solve that buffer
accurately, which can in turn be seen as applying multiple update towards the
minimizer of objective formulated from that buffer. That is the reason why we
only need one evoke of UDA and it gives us pretty good solution already. I
encourage you to test yourself. I will also run more experiments in the future.
Regarding the gradient, I think in this case it is not a very good
convergence indicator since hinge loss is not a smooth function, e.g., such
that when the loss does not change much the gradient is still far from zero.
And one might have overfitting issue if one really force the gradient to be
very small. So in this case loss value might be a better indicator of good
solution (not necessarily the optimal solution but one with small
generalization error).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---