Yanbo Liang created SPARK-11920:
-----------------------------------

             Summary: ML LinearRegression should use correct dataset in 
examples and user guide doc
                 Key: SPARK-11920
                 URL: https://issues.apache.org/jira/browse/SPARK-11920
             Project: Spark
          Issue Type: Improvement
          Components: Documentation, ML
            Reporter: Yanbo Liang
            Priority: Minor


ML LinearRegression use data/mllib/sample_libsvm_data.txt as dataset in 
examples and user guide doc, but it's actually classification dataset rather 
than regression dataset. We should use 
data/mllib/sample_linear_regression_data.txt instead.
Another reason is that LinearRegression with "normal" solver can not solve this 
dataset correctly, may be due to the ill condition and unreasonable label. This 
issue has been reported at SPARK-11918.
So we should make this change in examples and user guides, that can clearly 
illustrate the usage of LinearRegression algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to