Joseph K. Bradley created SPARK-6121:
----------------------------------------
Summary: Python DataFrame type inference for LabeledPoint gets
wrong type
Key: SPARK-6121
URL: https://issues.apache.org/jira/browse/SPARK-6121
Project: Spark
Issue Type: Bug
Components: MLlib, PySpark, SQL
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Priority: Minor
In Pyspark, when an RDD of LabeledPoints is converted to a DataFrame using
toDF(), the returned DataFrame has type "null" instead of VectorUDT.
To reproduce:
{code}
from pyspark.mllib.util import MLUtils
rdd = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
df = rdd.toDF()
{code}
Examine rdd and df to see:
{code}
>>> df
DataFrame[features: null, label: double]
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]