Ok, I've tried to add the intercept term myself (code here [1]), but with no luck.
It seems that adding a column of ones doesn't help with convergence either. I may have missed something in the coding as I'm quite a noob in Scala, but printing the data seem to indicate I succeeded in adding the ones column. Does anyone here has had success with this code on real-world datasets ? [1] https://github.com/oddskool/mllib-samples/tree/ridge (in the ridge branch) 2014-07-07 9:08 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>: > Well, why not, but IMHO MLLib Logistic Regression is unusable right now. > The inability to use intercept is just a no-go. I could hack a column of > ones to inject the intercept into the data but frankly it's a pithy to have > to do so. > > > 2014-07-05 23:04 GMT+02:00 DB Tsai <dbt...@dbtsai.com>: > > You may try LBFGS to have more stable convergence. In spark 1.1, we will >> be able to use LBFGS instead of GD in training process. >> On Jul 4, 2014 1:23 PM, "Thomas Robert" <tho...@creativedata.fr> wrote: >> >>> Hi all, >>> >>> I too am having some issues with *RegressionWithSGD algorithms. >>> >>> Concerning your issue Eustache, this could be due to the fact that these >>> regression algorithms uses a fixed step (that is divided by >>> sqrt(iteration)). During my tests, quite often, the algorithm diverged an >>> infinity cost, I guessed because the step was too big. I reduce it and >>> managed to get good results on a very simple generated dataset. >>> >>> But I was wondering if anyone here had some advises concerning the use >>> of these regression algorithms, for example how to choose a good step and >>> number of iterations? I wonder if I'm using those right... >>> >>> Thanks, >>> >>> -- >>> >>> *Thomas ROBERT* >>> www.creativedata.fr >>> >>> >>> 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>: >>> >>>> Printing the model show the intercept is always 0 :( >>>> >>>> Should I open a bug for that ? >>>> >>>> >>>> 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>: >>>> >>>>> Hi list, >>>>> >>>>> I'm benchmarking MLlib for a regression task [1] and get strange >>>>> results. >>>>> >>>>> Namely, using RidgeRegressionWithSGD it seems the predicted points >>>>> miss the intercept: >>>>> >>>>> {code} >>>>> val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) >>>>> ... >>>>> valuesAndPreds.take(10).map(t => println(t)) >>>>> {code} >>>>> >>>>> output: >>>>> >>>>> (2007.0,-3.784588726958493E75) >>>>> (2003.0,-1.9562390324037716E75) >>>>> (2005.0,-4.147413202985629E75) >>>>> (2003.0,-1.524938024096847E75) >>>>> ... >>>>> >>>>> If I change the parameters (step size, regularization and iterations) >>>>> I get NaNs more often than not: >>>>> (2007.0,NaN) >>>>> (2003.0,NaN) >>>>> (2005.0,NaN) >>>>> ... >>>>> >>>>> On the other hand DecisionTree model give sensible results. >>>>> >>>>> I see there is a `setIntercept()` method in abstract class >>>>> GeneralizedLinearAlgorithm that seems to trigger the use of the intercept >>>>> but I'm unable to use it from the public interface :( >>>>> >>>>> Any help appreciated :) >>>>> >>>>> Eustache >>>>> >>>>> [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD >>>>> >>>> >>> >