Nice to hear that your experiment is consistent to my assumption. The current L1/L2 will penalize the intercept as well which is not idea. I'm working on GLMNET in Spark using OWLQN, and I can exactly get the same solution as R but with scalability in # of rows and columns. Stay tuned!
Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, Sep 29, 2014 at 11:45 AM, Yanbo Liang <yanboha...@gmail.com> wrote: > Thank you for all your patient response. > > I can conclude that if the data is totally separable or over-fit occurs, > weights may be different. > And it also consistent with my experiment. > > I have evaluate two different dataset and the result as followed: > Loss function: LogisticGradient > Regularizer: L2 > regParam: 1.0 > numIterations: 10000 (SGD) > > Dataset 1: spark-1.1.0/data/mllib/sample_binary_classification_data.txt > # of classes: 2 > # of samples: 100 > # of features: 692 > areaUnderROC of both SGD and LBFGS can reach nearly 1.0 > Loss function of both optimization method converge nearly > 1.7147811767900675E-5 (very very small) > Weights of each optimization method is different but looks like multiple > relationship (not very strict) just as what DB Tsai mention above. It might > be the dataset is totally separable. > > Dataset 2: > http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#german.numer > # of classes: 2 > # of samples: 1000 > # of features: 24 > areaUnderROC of both SGD and LBFGS both are nearly 0.8 > Loss function of both optimization method converge nearly 0.5367041390107519 > Weights of each optimization method is just the same. > > > > 2014-09-29 16:05 GMT+08:00 DB Tsai <dbt...@dbtsai.com>: >> >> Can you check the loss of both LBFGS and SGD implementation? One >> reason maybe SGD doesn't converge well and you can see that by >> comparing both log-likelihoods. One other potential reason maybe the >> label of your training data is totally separable, so you can always >> increase the log-likelihood by multiply a constant to the weights. >> >> Sincerely, >> >> DB Tsai >> ------------------------------------------------------- >> My Blog: https://www.dbtsai.com >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> On Sun, Sep 28, 2014 at 11:48 AM, Yanbo Liang <yanboha...@gmail.com> >> wrote: >> > Hi >> > >> > We have used LogisticRegression with two different optimization method >> > SGD >> > and LBFGS in MLlib. >> > With the same dataset and the same training and test split, but get >> > different weights vector. >> > >> > For example, we use >> > spark-1.1.0/data/mllib/sample_binary_classification_data.txt as our >> > training >> > and test dataset. >> > With LogisticRegressionWithSGD and LogisticRegressionWithLBFGS as >> > training >> > method and the same other parameters. >> > >> > The precisions of these two methods almost near 100% and AUCs are also >> > near >> > 1.0. >> > As far as I know, the convex optimization problem will converge to the >> > global minimum value. (We use SGD with mini batch fraction as 1.0) >> > But I got two different weights vector? Is this expectation or make >> > sense? > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org