Re: [mllib] strange/buggy results with RidgeRegressionWithSGD
I tried to adjust stepSize between 1e-4 and 1, it doesn't seem to be the problem. Actually the problem is that the model doesn't use the intercept. So what happens is that it tries to compensate with super heavy weights ( 1e40) and ends up overflowing the model coefficients. MSE is exploding too, as a consequence. 2014-07-04 22:22 GMT+02:00 Thomas Robert tho...@creativedata.fr: Hi all, I too am having some issues with *RegressionWithSGD algorithms. Concerning your issue Eustache, this could be due to the fact that these regression algorithms uses a fixed step (that is divided by sqrt(iteration)). During my tests, quite often, the algorithm diverged an infinity cost, I guessed because the step was too big. I reduce it and managed to get good results on a very simple generated dataset. But I was wondering if anyone here had some advises concerning the use of these regression algorithms, for example how to choose a good step and number of iterations? I wonder if I'm using those right... Thanks, -- *Thomas ROBERT* www.creativedata.fr 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t = println(t)) {code} output: (2007.0,-3.784588726958493E75) (2003.0,-1.9562390324037716E75) (2005.0,-4.147413202985629E75) (2003.0,-1.524938024096847E75) ... If I change the parameters (step size, regularization and iterations) I get NaNs more often than not: (2007.0,NaN) (2003.0,NaN) (2005.0,NaN) ... On the other hand DecisionTree model give sensible results. I see there is a `setIntercept()` method in abstract class GeneralizedLinearAlgorithm that seems to trigger the use of the intercept but I'm unable to use it from the public interface :( Any help appreciated :) Eustache [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
Re: [mllib] strange/buggy results with RidgeRegressionWithSGD
Well, why not, but IMHO MLLib Logistic Regression is unusable right now. The inability to use intercept is just a no-go. I could hack a column of ones to inject the intercept into the data but frankly it's a pithy to have to do so. 2014-07-05 23:04 GMT+02:00 DB Tsai dbt...@dbtsai.com: You may try LBFGS to have more stable convergence. In spark 1.1, we will be able to use LBFGS instead of GD in training process. On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote: Hi all, I too am having some issues with *RegressionWithSGD algorithms. Concerning your issue Eustache, this could be due to the fact that these regression algorithms uses a fixed step (that is divided by sqrt(iteration)). During my tests, quite often, the algorithm diverged an infinity cost, I guessed because the step was too big. I reduce it and managed to get good results on a very simple generated dataset. But I was wondering if anyone here had some advises concerning the use of these regression algorithms, for example how to choose a good step and number of iterations? I wonder if I'm using those right... Thanks, -- *Thomas ROBERT* www.creativedata.fr 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t = println(t)) {code} output: (2007.0,-3.784588726958493E75) (2003.0,-1.9562390324037716E75) (2005.0,-4.147413202985629E75) (2003.0,-1.524938024096847E75) ... If I change the parameters (step size, regularization and iterations) I get NaNs more often than not: (2007.0,NaN) (2003.0,NaN) (2005.0,NaN) ... On the other hand DecisionTree model give sensible results. I see there is a `setIntercept()` method in abstract class GeneralizedLinearAlgorithm that seems to trigger the use of the intercept but I'm unable to use it from the public interface :( Any help appreciated :) Eustache [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
Re: [mllib] strange/buggy results with RidgeRegressionWithSGD
Ok, I've tried to add the intercept term myself (code here [1]), but with no luck. It seems that adding a column of ones doesn't help with convergence either. I may have missed something in the coding as I'm quite a noob in Scala, but printing the data seem to indicate I succeeded in adding the ones column. Does anyone here has had success with this code on real-world datasets ? [1] https://github.com/oddskool/mllib-samples/tree/ridge (in the ridge branch) 2014-07-07 9:08 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Well, why not, but IMHO MLLib Logistic Regression is unusable right now. The inability to use intercept is just a no-go. I could hack a column of ones to inject the intercept into the data but frankly it's a pithy to have to do so. 2014-07-05 23:04 GMT+02:00 DB Tsai dbt...@dbtsai.com: You may try LBFGS to have more stable convergence. In spark 1.1, we will be able to use LBFGS instead of GD in training process. On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote: Hi all, I too am having some issues with *RegressionWithSGD algorithms. Concerning your issue Eustache, this could be due to the fact that these regression algorithms uses a fixed step (that is divided by sqrt(iteration)). During my tests, quite often, the algorithm diverged an infinity cost, I guessed because the step was too big. I reduce it and managed to get good results on a very simple generated dataset. But I was wondering if anyone here had some advises concerning the use of these regression algorithms, for example how to choose a good step and number of iterations? I wonder if I'm using those right... Thanks, -- *Thomas ROBERT* www.creativedata.fr 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t = println(t)) {code} output: (2007.0,-3.784588726958493E75) (2003.0,-1.9562390324037716E75) (2005.0,-4.147413202985629E75) (2003.0,-1.524938024096847E75) ... If I change the parameters (step size, regularization and iterations) I get NaNs more often than not: (2007.0,NaN) (2003.0,NaN) (2005.0,NaN) ... On the other hand DecisionTree model give sensible results. I see there is a `setIntercept()` method in abstract class GeneralizedLinearAlgorithm that seems to trigger the use of the intercept but I'm unable to use it from the public interface :( Any help appreciated :) Eustache [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
Re: [mllib] strange/buggy results with RidgeRegressionWithSGD
You may try LBFGS to have more stable convergence. In spark 1.1, we will be able to use LBFGS instead of GD in training process. On Jul 4, 2014 1:23 PM, Thomas Robert tho...@creativedata.fr wrote: Hi all, I too am having some issues with *RegressionWithSGD algorithms. Concerning your issue Eustache, this could be due to the fact that these regression algorithms uses a fixed step (that is divided by sqrt(iteration)). During my tests, quite often, the algorithm diverged an infinity cost, I guessed because the step was too big. I reduce it and managed to get good results on a very simple generated dataset. But I was wondering if anyone here had some advises concerning the use of these regression algorithms, for example how to choose a good step and number of iterations? I wonder if I'm using those right... Thanks, -- *Thomas ROBERT* www.creativedata.fr 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t = println(t)) {code} output: (2007.0,-3.784588726958493E75) (2003.0,-1.9562390324037716E75) (2005.0,-4.147413202985629E75) (2003.0,-1.524938024096847E75) ... If I change the parameters (step size, regularization and iterations) I get NaNs more often than not: (2007.0,NaN) (2003.0,NaN) (2005.0,NaN) ... On the other hand DecisionTree model give sensible results. I see there is a `setIntercept()` method in abstract class GeneralizedLinearAlgorithm that seems to trigger the use of the intercept but I'm unable to use it from the public interface :( Any help appreciated :) Eustache [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
Re: [mllib] strange/buggy results with RidgeRegressionWithSGD
Hi all, I too am having some issues with *RegressionWithSGD algorithms. Concerning your issue Eustache, this could be due to the fact that these regression algorithms uses a fixed step (that is divided by sqrt(iteration)). During my tests, quite often, the algorithm diverged an infinity cost, I guessed because the step was too big. I reduce it and managed to get good results on a very simple generated dataset. But I was wondering if anyone here had some advises concerning the use of these regression algorithms, for example how to choose a good step and number of iterations? I wonder if I'm using those right... Thanks, -- *Thomas ROBERT* www.creativedata.fr 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t = println(t)) {code} output: (2007.0,-3.784588726958493E75) (2003.0,-1.9562390324037716E75) (2005.0,-4.147413202985629E75) (2003.0,-1.524938024096847E75) ... If I change the parameters (step size, regularization and iterations) I get NaNs more often than not: (2007.0,NaN) (2003.0,NaN) (2005.0,NaN) ... On the other hand DecisionTree model give sensible results. I see there is a `setIntercept()` method in abstract class GeneralizedLinearAlgorithm that seems to trigger the use of the intercept but I'm unable to use it from the public interface :( Any help appreciated :) Eustache [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
Re: [mllib] strange/buggy results with RidgeRegressionWithSGD
Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t = println(t)) {code} output: (2007.0,-3.784588726958493E75) (2003.0,-1.9562390324037716E75) (2005.0,-4.147413202985629E75) (2003.0,-1.524938024096847E75) ... If I change the parameters (step size, regularization and iterations) I get NaNs more often than not: (2007.0,NaN) (2003.0,NaN) (2005.0,NaN) ... On the other hand DecisionTree model give sensible results. I see there is a `setIntercept()` method in abstract class GeneralizedLinearAlgorithm that seems to trigger the use of the intercept but I'm unable to use it from the public interface :( Any help appreciated :) Eustache [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD
[mllib] strange/buggy results with RidgeRegressionWithSGD
Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t = println(t)) {code} output: (2007.0,-3.784588726958493E75) (2003.0,-1.9562390324037716E75) (2005.0,-4.147413202985629E75) (2003.0,-1.524938024096847E75) ... If I change the parameters (step size, regularization and iterations) I get NaNs more often than not: (2007.0,NaN) (2003.0,NaN) (2005.0,NaN) ... On the other hand DecisionTree model give sensible results. I see there is a `setIntercept()` method in abstract class GeneralizedLinearAlgorithm that seems to trigger the use of the intercept but I'm unable to use it from the public interface :( Any help appreciated :) Eustache [1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD