subject:"\[mllib\] strange\/buggy results with RidgeRegressionWithSGD"

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT

I tried to adjust stepSize between 1e-4 and 1, it doesn't seem to be the
problem. Actually the problem is that the model doesn't use the intercept.
So what happens is that it tries to compensate with super heavy weights (
1e40) and ends up overflowing the model coefficients. MSE is exploding too,
as a consequence.


2014-07-04 22:22 GMT+02:00 Thomas Robert tho...@creativedata.fr:

 Hi all,

 I too am having some issues with *RegressionWithSGD algorithms.

 Concerning your issue Eustache, this could be due to the fact that these
 regression algorithms uses a fixed step (that is divided by
 sqrt(iteration)). During my tests, quite often, the algorithm diverged an
 infinity cost, I guessed because the step was too big. I reduce it and
 managed to get good results on a very simple generated dataset.

 But I was wondering if anyone here had some advises concerning the use of
 these regression algorithms, for example how to choose a good step and
 number of iterations? I wonder if I'm using those right...

 Thanks,

 --

 *Thomas ROBERT*
 www.creativedata.fr


 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange
 results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT

Well, why not, but IMHO MLLib Logistic Regression is unusable right now.
The inability to use intercept is just a no-go. I could hack a column of
ones to inject the intercept into the data but frankly it's a pithy to have
to do so.


2014-07-05 23:04 GMT+02:00 DB Tsai dbt...@dbtsai.com:

 You may try LBFGS to have more stable convergence. In spark 1.1, we will
 be able to use LBFGS instead of GD in training process.
 On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote:

 Hi all,

 I too am having some issues with *RegressionWithSGD algorithms.

 Concerning your issue Eustache, this could be due to the fact that these
 regression algorithms uses a fixed step (that is divided by
 sqrt(iteration)). During my tests, quite often, the algorithm diverged an
 infinity cost, I guessed because the step was too big. I reduce it and
 managed to get good results on a very simple generated dataset.

 But I was wondering if anyone here had some advises concerning the use of
 these regression algorithms, for example how to choose a good step and
 number of iterations? I wonder if I'm using those right...

 Thanks,

 --

 *Thomas ROBERT*
 www.creativedata.fr


 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange
 results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT

Ok, I've tried to add the intercept term myself (code here [1]), but with
no luck.

It seems that adding a column of ones doesn't help with convergence either.

I may have missed something in the coding as I'm quite a noob in Scala, but
printing the data seem to indicate I succeeded in adding the ones column.

Does anyone here has had success with this code on real-world datasets ?

[1] https://github.com/oddskool/mllib-samples/tree/ridge (in the ridge
branch)




2014-07-07 9:08 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Well, why not, but IMHO MLLib Logistic Regression is unusable right now.
 The inability to use intercept is just a no-go. I could hack a column of
 ones to inject the intercept into the data but frankly it's a pithy to have
 to do so.


 2014-07-05 23:04 GMT+02:00 DB Tsai dbt...@dbtsai.com:

 You may try LBFGS to have more stable convergence. In spark 1.1, we will
 be able to use LBFGS instead of GD in training process.
 On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote:

 Hi all,

 I too am having some issues with *RegressionWithSGD algorithms.

 Concerning your issue Eustache, this could be due to the fact that these
 regression algorithms uses a fixed step (that is divided by
 sqrt(iteration)). During my tests, quite often, the algorithm diverged an
 infinity cost, I guessed because the step was too big. I reduce it and
 managed to get good results on a very simple generated dataset.

 But I was wondering if anyone here had some advises concerning the use
 of these regression algorithms, for example how to choose a good step and
 number of iterations? I wonder if I'm using those right...

 Thanks,

 --

 *Thomas ROBERT*
 www.creativedata.fr


 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange
 results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points
 miss the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations)
 I get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-05 Thread DB Tsai

You may try LBFGS to have more stable convergence. In spark 1.1, we will be
able to use LBFGS instead of GD in training process.
On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote:

 Hi all,

 I too am having some issues with *RegressionWithSGD algorithms.

 Concerning your issue Eustache, this could be due to the fact that these
 regression algorithms uses a fixed step (that is divided by
 sqrt(iteration)). During my tests, quite often, the algorithm diverged an
 infinity cost, I guessed because the step was too big. I reduce it and
 managed to get good results on a very simple generated dataset.

 But I was wondering if anyone here had some advises concerning the use of
 these regression algorithms, for example how to choose a good step and
 number of iterations? I wonder if I'm using those right...

 Thanks,

 --

 *Thomas ROBERT*
 www.creativedata.fr


 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange
 results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-04 Thread Thomas Robert

Hi all,

I too am having some issues with *RegressionWithSGD algorithms.

Concerning your issue Eustache, this could be due to the fact that these
regression algorithms uses a fixed step (that is divided by
sqrt(iteration)). During my tests, quite often, the algorithm diverged an
infinity cost, I guessed because the step was too big. I reduce it and
managed to get good results on a very simple generated dataset.

But I was wondering if anyone here had some advises concerning the use of
these regression algorithms, for example how to choose a good step and
number of iterations? I wonder if I'm using those right...

Thanks,

-- 

*Thomas ROBERT*
www.creativedata.fr


2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-03 Thread Eustache DIEMERT

Printing the model show the intercept is always 0 :(

Should I open a bug for that ?


2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

[mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-02 Thread Eustache DIEMERT

Hi list,

I'm benchmarking MLlib for a regression task [1] and get strange results.

Namely, using RidgeRegressionWithSGD it seems the predicted points miss the
intercept:

{code}
val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
...
valuesAndPreds.take(10).map(t = println(t))
{code}

output:

(2007.0,-3.784588726958493E75)
(2003.0,-1.9562390324037716E75)
(2005.0,-4.147413202985629E75)
(2003.0,-1.524938024096847E75)
...

If I change the parameters (step size, regularization and iterations) I get
NaNs more often than not:
(2007.0,NaN)
(2003.0,NaN)
(2005.0,NaN)
...

On the other hand DecisionTree model give sensible results.

I see there is a `setIntercept()` method in abstract class
GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
but I'm unable to use it from the public interface :(

Any help appreciated :)

Eustache

[1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

[mllib] strange/buggy results with RidgeRegressionWithSGD

7 matches

Site Navigation

Mail list logo

Footer information