Repository: spark
Updated Branches:
  refs/heads/branch-1.6 7db361032 -> 99938db6b


[SPARK-11920][ML][DOC] ML LinearRegression should use correct dataset in 
examples and user guide doc

ML ```LinearRegression``` use ```data/mllib/sample_libsvm_data.txt``` as 
dataset in examples and user guide doc, but it's actually classification 
dataset rather than regression dataset. We should use 
```data/mllib/sample_linear_regression_data.txt``` instead.
The deeper causes is that ```LinearRegression``` with "normal" solver can not 
solve this dataset correctly, may be due to the ill condition and unreasonable 
label. This issue has been reported at 
[SPARK-11918](https://issues.apache.org/jira/browse/SPARK-11918).
It will confuse users if they run the example code but get exception, so we 
should make this change which can clearly illustrate the usage of 
```LinearRegression``` algorithm.

Author: Yanbo Liang <[email protected]>

Closes #9905 from yanboliang/spark-11920.

(cherry picked from commit 98d7ec7df4bb115dbd84cb9acd744b6c8abfebd5)
Signed-off-by: Joseph K. Bradley <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99938db6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99938db6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99938db6

Branch: refs/heads/branch-1.6
Commit: 99938db6b479f4b2d08bb6dffdf0c2bac8ce5023
Parents: 7db3610
Author: Yanbo Liang <[email protected]>
Authored: Mon Nov 23 11:51:29 2015 -0800
Committer: Joseph K. Bradley <[email protected]>
Committed: Mon Nov 23 11:51:40 2015 -0800

----------------------------------------------------------------------
 .../examples/ml/JavaLinearRegressionWithElasticNetExample.java    | 2 +-
 examples/src/main/python/ml/linear_regression_with_elastic_net.py | 3 ++-
 .../spark/examples/ml/LinearRegressionWithElasticNetExample.scala | 3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/99938db6/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java
----------------------------------------------------------------------
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java
index 593f8fb..4ad7676 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java
@@ -37,7 +37,7 @@ public class JavaLinearRegressionWithElasticNetExample {
     // $example on$
     // Load training data
     DataFrame training = sqlContext.read().format("libsvm")
-      .load("data/mllib/sample_libsvm_data.txt");
+      .load("data/mllib/sample_linear_regression_data.txt");
 
     LinearRegression lr = new LinearRegression()
       .setMaxIter(10)

http://git-wip-us.apache.org/repos/asf/spark/blob/99938db6/examples/src/main/python/ml/linear_regression_with_elastic_net.py
----------------------------------------------------------------------
diff --git a/examples/src/main/python/ml/linear_regression_with_elastic_net.py 
b/examples/src/main/python/ml/linear_regression_with_elastic_net.py
index b027827..a4cd40c 100644
--- a/examples/src/main/python/ml/linear_regression_with_elastic_net.py
+++ b/examples/src/main/python/ml/linear_regression_with_elastic_net.py
@@ -29,7 +29,8 @@ if __name__ == "__main__":
 
     # $example on$
     # Load training data
-    training = 
sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
+    training = sqlContext.read.format("libsvm")\
+        .load("data/mllib/sample_linear_regression_data.txt")
 
     lr = LinearRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/99938db6/examples/src/main/scala/org/apache/spark/examples/ml/LinearRegressionWithElasticNetExample.scala
----------------------------------------------------------------------
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/ml/LinearRegressionWithElasticNetExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/ml/LinearRegressionWithElasticNetExample.scala
index 5a51ece..22c824c 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/ml/LinearRegressionWithElasticNetExample.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/ml/LinearRegressionWithElasticNetExample.scala
@@ -33,7 +33,8 @@ object LinearRegressionWithElasticNetExample {
 
     // $example on$
     // Load training data
-    val training = 
sqlCtx.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
+    val training = sqlCtx.read.format("libsvm")
+      .load("data/mllib/sample_linear_regression_data.txt")
 
     val lr = new LinearRegression()
       .setMaxIter(10)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to