Well, why not, but IMHO MLLib Logistic Regression is unusable right now. The inability to use intercept is just a no-go. I could hack a column of ones to inject the intercept into the data but frankly it's a pithy to have to do so.
2014-07-05 23:04 GMT+02:00 DB Tsai <dbt...@dbtsai.com>: > You may try LBFGS to have more stable convergence. In spark 1.1, we will > be able to use LBFGS instead of GD in training process. > On Jul 4, 2014 1:23 PM, "Thomas Robert" <tho...@creativedata.fr> wrote: > >> Hi all, >> >> I too am having some issues with *RegressionWithSGD algorithms. >> >> Concerning your issue Eustache, this could be due to the fact that these >> regression algorithms uses a fixed step (that is divided by >> sqrt(iteration)). During my tests, quite often, the algorithm diverged an >> infinity cost, I guessed because the step was too big. I reduce it and >> managed to get good results on a very simple generated dataset. >> >> But I was wondering if anyone here had some advises concerning the use of >> these regression algorithms, for example how to choose a good step and >> number of iterations? I wonder if I'm using those right... >> >> Thanks, >> >> -- >> >> *Thomas ROBERT* >> www.creativedata.fr >> >> >> 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>: >> >>> Printing the model show the intercept is always 0 :( >>> >>> Should I open a bug for that ? >>> >>> >>> 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>: >>> >>>> Hi list, >>>> >>>> I'm benchmarking MLlib for a regression task [1] and get strange >>>> results. >>>> >>>> Namely, using RidgeRegressionWithSGD it seems the predicted points miss >>>> the intercept: >>>> >>>> {code} >>>> val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) >>>> ... >>>> valuesAndPreds.take(10).map(t => println(t)) >>>> {code} >>>> >>>> output: >>>> >>>> (2007.0,-3.784588726958493E75) >>>> (2003.0,-1.9562390324037716E75) >>>> (2005.0,-4.147413202985629E75) >>>> (2003.0,-1.524938024096847E75) >>>> ... >>>> >>>> If I change the parameters (step size, regularization and iterations) I >>>> get NaNs more often than not: >>>> (2007.0,NaN) >>>> (2003.0,NaN) >>>> (2005.0,NaN) >>>> ... >>>> >>>> On the other hand DecisionTree model give sensible results. >>>> >>>> I see there is a `setIntercept()` method in abstract class >>>> GeneralizedLinearAlgorithm that seems to trigger the use of the intercept >>>> but I'm unable to use it from the public interface :( >>>> >>>> Any help appreciated :) >>>> >>>> Eustache >>>> >>>> [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD >>>> >>> >>