Well, why not, but IMHO MLLib Logistic Regression is unusable right now.
The inability to use intercept is just a no-go. I could hack a column of
ones to inject the intercept into the data but frankly it's a pithy to have
to do so.


2014-07-05 23:04 GMT+02:00 DB Tsai <dbt...@dbtsai.com>:

> You may try LBFGS to have more stable convergence. In spark 1.1, we will
> be able to use LBFGS instead of GD in training process.
> On Jul 4, 2014 1:23 PM, "Thomas Robert" <tho...@creativedata.fr> wrote:
>
>> Hi all,
>>
>> I too am having some issues with *RegressionWithSGD algorithms.
>>
>> Concerning your issue Eustache, this could be due to the fact that these
>> regression algorithms uses a fixed step (that is divided by
>> sqrt(iteration)). During my tests, quite often, the algorithm diverged an
>> infinity cost, I guessed because the step was too big. I reduce it and
>> managed to get good results on a very simple generated dataset.
>>
>> But I was wondering if anyone here had some advises concerning the use of
>> these regression algorithms, for example how to choose a good step and
>> number of iterations? I wonder if I'm using those right...
>>
>> Thanks,
>>
>> --
>>
>> *Thomas ROBERT*
>> www.creativedata.fr
>>
>>
>> 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>:
>>
>>> Printing the model show the intercept is always 0 :(
>>>
>>> Should I open a bug for that ?
>>>
>>>
>>> 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>:
>>>
>>>> Hi list,
>>>>
>>>> I'm benchmarking MLlib for a regression task [1] and get strange
>>>> results.
>>>>
>>>> Namely, using RidgeRegressionWithSGD it seems the predicted points miss
>>>> the intercept:
>>>>
>>>> {code}
>>>> val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
>>>> ...
>>>> valuesAndPreds.take(10).map(t => println(t))
>>>> {code}
>>>>
>>>> output:
>>>>
>>>> (2007.0,-3.784588726958493E75)
>>>> (2003.0,-1.9562390324037716E75)
>>>> (2005.0,-4.147413202985629E75)
>>>> (2003.0,-1.524938024096847E75)
>>>> ...
>>>>
>>>> If I change the parameters (step size, regularization and iterations) I
>>>> get NaNs more often than not:
>>>> (2007.0,NaN)
>>>> (2003.0,NaN)
>>>> (2005.0,NaN)
>>>> ...
>>>>
>>>> On the other hand DecisionTree model give sensible results.
>>>>
>>>> I see there is a `setIntercept()` method in abstract class
>>>> GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
>>>> but I'm unable to use it from the public interface :(
>>>>
>>>> Any help appreciated :)
>>>>
>>>> Eustache
>>>>
>>>> [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
>>>>
>>>
>>

Reply via email to