Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

Eustache DIEMERT Mon, 07 Jul 2014 02:19:28 -0700

Ok, I've tried to add the intercept term myself (code here [1]), but with
no luck.


It seems that adding a column of ones doesn't help with convergence either.

I may have missed something in the coding as I'm quite a noob in Scala, but
printing the data seem to indicate I succeeded in adding the ones column.

Does anyone here has had success with this code on real-world datasets ?

[1] https://github.com/oddskool/mllib-samples/tree/ridge (in the ridge
branch)




2014-07-07 9:08 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>:

> Well, why not, but IMHO MLLib Logistic Regression is unusable right now.
> The inability to use intercept is just a no-go. I could hack a column of
> ones to inject the intercept into the data but frankly it's a pithy to have
> to do so.
>
>
> 2014-07-05 23:04 GMT+02:00 DB Tsai <dbt...@dbtsai.com>:
>
> You may try LBFGS to have more stable convergence. In spark 1.1, we will
>> be able to use LBFGS instead of GD in training process.
>> On Jul 4, 2014 1:23 PM, "Thomas Robert" <tho...@creativedata.fr> wrote:
>>
>>> Hi all,
>>>
>>> I too am having some issues with *RegressionWithSGD algorithms.
>>>
>>> Concerning your issue Eustache, this could be due to the fact that these
>>> regression algorithms uses a fixed step (that is divided by
>>> sqrt(iteration)). During my tests, quite often, the algorithm diverged an
>>> infinity cost, I guessed because the step was too big. I reduce it and
>>> managed to get good results on a very simple generated dataset.
>>>
>>> But I was wondering if anyone here had some advises concerning the use
>>> of these regression algorithms, for example how to choose a good step and
>>> number of iterations? I wonder if I'm using those right...
>>>
>>> Thanks,
>>>
>>> --
>>>
>>> *Thomas ROBERT*
>>> www.creativedata.fr
>>>
>>>
>>> 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>:
>>>
>>>> Printing the model show the intercept is always 0 :(
>>>>
>>>> Should I open a bug for that ?
>>>>
>>>>
>>>> 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>:
>>>>
>>>>> Hi list,
>>>>>
>>>>> I'm benchmarking MLlib for a regression task [1] and get strange
>>>>> results.
>>>>>
>>>>> Namely, using RidgeRegressionWithSGD it seems the predicted points
>>>>> miss the intercept:
>>>>>
>>>>> {code}
>>>>> val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
>>>>> ...
>>>>> valuesAndPreds.take(10).map(t => println(t))
>>>>> {code}
>>>>>
>>>>> output:
>>>>>
>>>>> (2007.0,-3.784588726958493E75)
>>>>> (2003.0,-1.9562390324037716E75)
>>>>> (2005.0,-4.147413202985629E75)
>>>>> (2003.0,-1.524938024096847E75)
>>>>> ...
>>>>>
>>>>> If I change the parameters (step size, regularization and iterations)
>>>>> I get NaNs more often than not:
>>>>> (2007.0,NaN)
>>>>> (2003.0,NaN)
>>>>> (2005.0,NaN)
>>>>> ...
>>>>>
>>>>> On the other hand DecisionTree model give sensible results.
>>>>>
>>>>> I see there is a `setIntercept()` method in abstract class
>>>>> GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
>>>>> but I'm unable to use it from the public interface :(
>>>>>
>>>>> Any help appreciated :)
>>>>>
>>>>> Eustache
>>>>>
>>>>> [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
>>>>>
>>>>
>>>
>

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

Reply via email to