Yanbo Liang created SPARK-11920:
-----------------------------------
Summary: ML LinearRegression should use correct dataset in
examples and user guide doc
Key: SPARK-11920
URL: https://issues.apache.org/jira/browse/SPARK-11920
Project: Spark
Issue Type: Improvement
Components: Documentation, ML
Reporter: Yanbo Liang
Priority: Minor
ML LinearRegression use data/mllib/sample_libsvm_data.txt as dataset in
examples and user guide doc, but it's actually classification dataset rather
than regression dataset. We should use
data/mllib/sample_linear_regression_data.txt instead.
Another reason is that LinearRegression with "normal" solver can not solve this
dataset correctly, may be due to the ill condition and unreasonable label. This
issue has been reported at SPARK-11918.
So we should make this change in examples and user guides, that can clearly
illustrate the usage of LinearRegression algorithm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]