spark git commit: [SPARK-16231][PYSPARK][ML][EXAMPLES] dataframe_example.py fails to convert ML style vectors

meng Mon, 27 Jun 2016 14:12:57 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 e4bb31fb3 -> 27f3462d0



[SPARK-16231][PYSPARK][ML][EXAMPLES] dataframe_example.py fails to convert ML 
style vectors

## What changes were proposed in this pull request?
Need to convert ML Vectors to the old MLlib style before doing 
Statistics.colStats operations on the DataFrame

## How was this patch tested?
Ran example, local tests

Author: Bryan Cutler <[email protected]>

Closes #13928 from BryanCutler/pyspark-ml-example-vector-conv-SPARK-16231.

(cherry picked from commit 1aa191e58e905f470f73663fc1c35f36e05e929a)
Signed-off-by: Xiangrui Meng <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27f3462d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27f3462d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27f3462d

Branch: refs/heads/branch-2.0
Commit: 27f3462d0e11b4768140e452f02ab043438b8e86
Parents: e4bb31f
Author: Bryan Cutler <[email protected]>
Authored: Mon Jun 27 12:58:39 2016 -0700
Committer: Xiangrui Meng <[email protected]>
Committed: Mon Jun 27 14:12:31 2016 -0700

----------------------------------------------------------------------
 examples/src/main/python/ml/dataframe_example.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/27f3462d/examples/src/main/python/ml/dataframe_example.py
----------------------------------------------------------------------
diff --git a/examples/src/main/python/ml/dataframe_example.py 
b/examples/src/main/python/ml/dataframe_example.py
index a7d8b90..c1818d7 100644
--- a/examples/src/main/python/ml/dataframe_example.py
+++ b/examples/src/main/python/ml/dataframe_example.py
@@ -28,6 +28,7 @@ import shutil
 
 from pyspark.sql import SparkSession
 from pyspark.mllib.stat import Statistics
+from pyspark.mllib.util import MLUtils
 
 if __name__ == "__main__":
     if len(sys.argv) > 2:
@@ -55,7 +56,8 @@ if __name__ == "__main__":
     labelSummary.show()
 
     # Convert features column to an RDD of vectors.
-    features = df.select("features").rdd.map(lambda r: r.features)
+    features = MLUtils.convertVectorColumnsFromML(df, "features") \
+        .select("features").rdd.map(lambda r: r.features)
     summary = Statistics.colStats(features)
     print("Selected features column with average values:\n" +
           str(summary.mean()))


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-16231][PYSPARK][ML][EXAMPLES] dataframe_example.py fails to convert ML style vectors

Reply via email to