[
https://issues.apache.org/jira/browse/MADLIB-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732994#comment-15732994
]
Xiaocheng Tang commented on MADLIB-1049:
----------------------------------------
- stepsize; it is also called learning rate. If not chosen properly, it might
cause the learning to diverge. It is often hard to tell which value is the
best, but at least some warning need to be provided when divergence is
detected. For example, one could monitor the loss value and make sure that it
is not increasing to an abnormally large value
- squared hinge loss; it behaves quite differently than the standard hinge loss
due to the squared smoothing effect. Hence a stepsize that is good for svm or
logreg might not be good here.
- nEpoch; a proper value depends on the buffer size, e.g., how many training
data are in one learning buffer. The larger a buffer is, the larger number
nEpoch can take before overfitting. A safer choice would be <10 from my
experience. More experiments will be helpful before concrete suggestions can be
provided
- intercept; due to regularization the intercept term needs to be handled
explicitly such that it is not regularized as the weights do.
- trans(x); the buffer is transposed and copied before feed into the training
algorithm. The layout of the model (along with implementations of
lossAndGradient) need to be changed accordingly if transpose and copy are to be
avoided, i.e., to use `MappedMatrix`.
- labels; assumed to be consecutive nonnegative integer starting from 0. The
assumption should be verified before calling the UDA training function.
- batch_size; a larger batch means more accurate gradient update but also takes
more time to compute. When you put m examples in a minibatch, you need to do
O(m) computation and use O(m) memory, but you reduce the amount of uncertainty
in the gradient by a factor of only O(sqrt(m)). In other words, there are
diminishing marginal returns to putting more examples in the minibatch. The
theoretical reason behind the benefit of using mini-batch is still an active
research topic and has something to do with large-batch methods often
converging to sharp minima that lead to [poor
generalization](https://arxiv.org/abs/1609.04836)
> Create generic multi-class classifier
> -------------------------------------
>
> Key: MADLIB-1049
> URL: https://issues.apache.org/jira/browse/MADLIB-1049
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Multiclass Classifier
> Reporter: Frank McQuillan
> Fix For: v1.10
>
>
> C++ part
> Single model that supports loss function as a parameter.
> Loss functions to support: squared hinge loss (SVM) and cross entropy
> (multinomial logistic regression).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)