Thank you for all your patient response.

I can conclude that if the data is totally separable or over-fit occurs,
weights may be different.
And it also consistent with my experiment.

I have evaluate two different dataset and the result as followed:
Loss function: LogisticGradient
Regularizer: L2
regParam: 1.0
numIterations: 10000 (SGD)

Dataset 1: spark-1.1.0/data/mllib/sample_binary_classification_data.txt
# of classes: 2
# of samples: 100
# of features: 692
areaUnderROC of both SGD and LBFGS can reach nearly 1.0
Loss function of both optimization method converge
nearly 1.7147811767900675E-5 (very very small)
Weights of each optimization method is different but looks like multiple
relationship (not very strict) just as what DB Tsai mention above.  It
might be the dataset is totally separable.

Dataset 2:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#german.numer
# of classes: 2
# of samples: 1000
# of features: 24
areaUnderROC of both SGD and LBFGS both are nearly 0.8
Loss function of both optimization method converge nearly 0.5367041390107519
Weights of each optimization method is just the same.



2014-09-29 16:05 GMT+08:00 DB Tsai <dbt...@dbtsai.com>:

> Can you check the loss of both LBFGS and SGD implementation? One
> reason maybe SGD doesn't converge well and you can see that by
> comparing both log-likelihoods. One other potential reason maybe the
> label of your training data is totally separable, so you can always
> increase the log-likelihood by multiply a constant to the weights.
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Sun, Sep 28, 2014 at 11:48 AM, Yanbo Liang <yanboha...@gmail.com>
> wrote:
> > Hi
> >
> > We have used LogisticRegression with two different optimization method
> SGD
> > and LBFGS in MLlib.
> > With the same dataset and the same training and test split, but get
> > different weights vector.
> >
> > For example, we use
> > spark-1.1.0/data/mllib/sample_binary_classification_data.txt as our
> training
> > and test dataset.
> > With LogisticRegressionWithSGD and LogisticRegressionWithLBFGS as
> training
> > method and the same other parameters.
> >
> > The precisions of these two methods almost near 100% and AUCs are also
> near
> > 1.0.
> > As far as I know, the convex optimization problem will converge to the
> > global minimum value. (We use SGD with mini batch fraction as 1.0)
> > But I got two different weights vector? Is this expectation or make
> sense?
>

Reply via email to