Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
I tried to adjust stepSize between 1e-4 and 1, it doesn't seem to be the
problem. Actually the problem is that the model doesn't use the intercept.
So what happens is that it tries to compensate with super heavy weights (
1e40) and ends up overflowing the model coefficients. MSE is exploding too,
as a consequence.


2014-07-04 22:22 GMT+02:00 Thomas Robert tho...@creativedata.fr:

 Hi all,

 I too am having some issues with *RegressionWithSGD algorithms.

 Concerning your issue Eustache, this could be due to the fact that these
 regression algorithms uses a fixed step (that is divided by
 sqrt(iteration)). During my tests, quite often, the algorithm diverged an
 infinity cost, I guessed because the step was too big. I reduce it and
 managed to get good results on a very simple generated dataset.

 But I was wondering if anyone here had some advises concerning the use of
 these regression algorithms, for example how to choose a good step and
 number of iterations? I wonder if I'm using those right...

 Thanks,

 --

 *Thomas ROBERT*
 www.creativedata.fr


 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange
 results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD





Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
Well, why not, but IMHO MLLib Logistic Regression is unusable right now.
The inability to use intercept is just a no-go. I could hack a column of
ones to inject the intercept into the data but frankly it's a pithy to have
to do so.


2014-07-05 23:04 GMT+02:00 DB Tsai dbt...@dbtsai.com:

 You may try LBFGS to have more stable convergence. In spark 1.1, we will
 be able to use LBFGS instead of GD in training process.
 On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote:

 Hi all,

 I too am having some issues with *RegressionWithSGD algorithms.

 Concerning your issue Eustache, this could be due to the fact that these
 regression algorithms uses a fixed step (that is divided by
 sqrt(iteration)). During my tests, quite often, the algorithm diverged an
 infinity cost, I guessed because the step was too big. I reduce it and
 managed to get good results on a very simple generated dataset.

 But I was wondering if anyone here had some advises concerning the use of
 these regression algorithms, for example how to choose a good step and
 number of iterations? I wonder if I'm using those right...

 Thanks,

 --

 *Thomas ROBERT*
 www.creativedata.fr


 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange
 results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD





Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
Ok, I've tried to add the intercept term myself (code here [1]), but with
no luck.

It seems that adding a column of ones doesn't help with convergence either.

I may have missed something in the coding as I'm quite a noob in Scala, but
printing the data seem to indicate I succeeded in adding the ones column.

Does anyone here has had success with this code on real-world datasets ?

[1] https://github.com/oddskool/mllib-samples/tree/ridge (in the ridge
branch)




2014-07-07 9:08 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Well, why not, but IMHO MLLib Logistic Regression is unusable right now.
 The inability to use intercept is just a no-go. I could hack a column of
 ones to inject the intercept into the data but frankly it's a pithy to have
 to do so.


 2014-07-05 23:04 GMT+02:00 DB Tsai dbt...@dbtsai.com:

 You may try LBFGS to have more stable convergence. In spark 1.1, we will
 be able to use LBFGS instead of GD in training process.
 On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote:

 Hi all,

 I too am having some issues with *RegressionWithSGD algorithms.

 Concerning your issue Eustache, this could be due to the fact that these
 regression algorithms uses a fixed step (that is divided by
 sqrt(iteration)). During my tests, quite often, the algorithm diverged an
 infinity cost, I guessed because the step was too big. I reduce it and
 managed to get good results on a very simple generated dataset.

 But I was wondering if anyone here had some advises concerning the use
 of these regression algorithms, for example how to choose a good step and
 number of iterations? I wonder if I'm using those right...

 Thanks,

 --

 *Thomas ROBERT*
 www.creativedata.fr


 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange
 results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points
 miss the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations)
 I get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD






Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-05 Thread DB Tsai
You may try LBFGS to have more stable convergence. In spark 1.1, we will be
able to use LBFGS instead of GD in training process.
On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote:

 Hi all,

 I too am having some issues with *RegressionWithSGD algorithms.

 Concerning your issue Eustache, this could be due to the fact that these
 regression algorithms uses a fixed step (that is divided by
 sqrt(iteration)). During my tests, quite often, the algorithm diverged an
 infinity cost, I guessed because the step was too big. I reduce it and
 managed to get good results on a very simple generated dataset.

 But I was wondering if anyone here had some advises concerning the use of
 these regression algorithms, for example how to choose a good step and
 number of iterations? I wonder if I'm using those right...

 Thanks,

 --

 *Thomas ROBERT*
 www.creativedata.fr


 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange
 results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD





Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-04 Thread Thomas Robert
Hi all,

I too am having some issues with *RegressionWithSGD algorithms.

Concerning your issue Eustache, this could be due to the fact that these
regression algorithms uses a fixed step (that is divided by
sqrt(iteration)). During my tests, quite often, the algorithm diverged an
infinity cost, I guessed because the step was too big. I reduce it and
managed to get good results on a very simple generated dataset.

But I was wondering if anyone here had some advises concerning the use of
these regression algorithms, for example how to choose a good step and
number of iterations? I wonder if I'm using those right...

Thanks,

-- 

*Thomas ROBERT*
www.creativedata.fr


2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Printing the model show the intercept is always 0 :(

 Should I open a bug for that ?


 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD




Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-03 Thread Eustache DIEMERT
Printing the model show the intercept is always 0 :(

Should I open a bug for that ?


2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr:

 Hi list,

 I'm benchmarking MLlib for a regression task [1] and get strange results.

 Namely, using RidgeRegressionWithSGD it seems the predicted points miss
 the intercept:

 {code}
 val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
 ...
 valuesAndPreds.take(10).map(t = println(t))
 {code}

 output:

 (2007.0,-3.784588726958493E75)
 (2003.0,-1.9562390324037716E75)
 (2005.0,-4.147413202985629E75)
 (2003.0,-1.524938024096847E75)
 ...

 If I change the parameters (step size, regularization and iterations) I
 get NaNs more often than not:
 (2007.0,NaN)
 (2003.0,NaN)
 (2005.0,NaN)
 ...

 On the other hand DecisionTree model give sensible results.

 I see there is a `setIntercept()` method in abstract class
 GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
 but I'm unable to use it from the public interface :(

 Any help appreciated :)

 Eustache

 [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD



[mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-02 Thread Eustache DIEMERT
Hi list,

I'm benchmarking MLlib for a regression task [1] and get strange results.

Namely, using RidgeRegressionWithSGD it seems the predicted points miss the
intercept:

{code}
val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
...
valuesAndPreds.take(10).map(t = println(t))
{code}

output:

(2007.0,-3.784588726958493E75)
(2003.0,-1.9562390324037716E75)
(2005.0,-4.147413202985629E75)
(2003.0,-1.524938024096847E75)
...

If I change the parameters (step size, regularization and iterations) I get
NaNs more often than not:
(2007.0,NaN)
(2003.0,NaN)
(2005.0,NaN)
...

On the other hand DecisionTree model give sensible results.

I see there is a `setIntercept()` method in abstract class
GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
but I'm unable to use it from the public interface :(

Any help appreciated :)

Eustache

[1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD