Repository: spark Updated Branches: refs/heads/branch-2.0 e4bb31fb3 -> 27f3462d0
[SPARK-16231][PYSPARK][ML][EXAMPLES] dataframe_example.py fails to convert ML style vectors ## What changes were proposed in this pull request? Need to convert ML Vectors to the old MLlib style before doing Statistics.colStats operations on the DataFrame ## How was this patch tested? Ran example, local tests Author: Bryan Cutler <cutl...@gmail.com> Closes #13928 from BryanCutler/pyspark-ml-example-vector-conv-SPARK-16231. (cherry picked from commit 1aa191e58e905f470f73663fc1c35f36e05e929a) Signed-off-by: Xiangrui Meng <m...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27f3462d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27f3462d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27f3462d Branch: refs/heads/branch-2.0 Commit: 27f3462d0e11b4768140e452f02ab043438b8e86 Parents: e4bb31f Author: Bryan Cutler <cutl...@gmail.com> Authored: Mon Jun 27 12:58:39 2016 -0700 Committer: Xiangrui Meng <m...@databricks.com> Committed: Mon Jun 27 14:12:31 2016 -0700 ---------------------------------------------------------------------- examples/src/main/python/ml/dataframe_example.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/27f3462d/examples/src/main/python/ml/dataframe_example.py ---------------------------------------------------------------------- diff --git a/examples/src/main/python/ml/dataframe_example.py b/examples/src/main/python/ml/dataframe_example.py index a7d8b90..c1818d7 100644 --- a/examples/src/main/python/ml/dataframe_example.py +++ b/examples/src/main/python/ml/dataframe_example.py @@ -28,6 +28,7 @@ import shutil from pyspark.sql import SparkSession from pyspark.mllib.stat import Statistics +from pyspark.mllib.util import MLUtils if __name__ == "__main__": if len(sys.argv) > 2: @@ -55,7 +56,8 @@ if __name__ == "__main__": labelSummary.show() # Convert features column to an RDD of vectors. - features = df.select("features").rdd.map(lambda r: r.features) + features = MLUtils.convertVectorColumnsFromML(df, "features") \ + .select("features").rdd.map(lambda r: r.features) summary = Statistics.colStats(features) print("Selected features column with average values:\n" + str(summary.mean())) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org