Joseph K. Bradley created SPARK-10305: -----------------------------------------
Summary: PySpark createDataFrame on list of LabeledPoints fails (regression) Key: SPARK-10305 URL: https://issues.apache.org/jira/browse/SPARK-10305 Project: Spark Issue Type: Bug Components: ML, PySpark, SQL Affects Versions: 1.5.0 Reporter: Joseph K. Bradley Priority: Critical The following code works in 1.4 but fails in 1.5: {code} import numpy as np from pyspark.mllib.regression import LabeledPoint from pyspark.mllib.linalg import Vectors lp1 = LabeledPoint(1.0, Vectors.sparse(5, np.array([0, 1]), np.array([2.0, 21.0]))) lp2 = LabeledPoint(0.0, Vectors.sparse(5, np.array([2, 3]), np.array([2.0, 21.0]))) tmp = [lp1, lp2] sqlContext.createDataFrame(tmp).show() {code} The failure is: {code} ValueError: Unexpected tuple LabeledPoint(1.0, (5,[0,1],[2.0,21.0])) with StructType --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-1-0e7cb8772e10> in <module>() 6 lp2 = LabeledPoint(0.0, Vectors.sparse(5, np.array([2, 3]), np.array([2.0, 21.0]))) 7 tmp = [lp1, lp2] ----> 8 sqlContext.createDataFrame(tmp).show() /home/ubuntu/databricks/spark/python/pyspark/sql/context.pyc in createDataFrame(self, data, schema, samplingRatio) 404 rdd, schema = self._createFromRDD(data, schema, samplingRatio) 405 else: --> 406 rdd, schema = self._createFromLocal(data, schema) 407 jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd()) 408 jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json()) /home/ubuntu/databricks/spark/python/pyspark/sql/context.pyc in _createFromLocal(self, data, schema) 335 336 # convert python objects to sql data --> 337 data = [schema.toInternal(row) for row in data] 338 return self._sc.parallelize(data), schema 339 /home/ubuntu/databricks/spark/python/pyspark/sql/types.pyc in toInternal(self, obj) 539 return tuple(f.toInternal(v) for f, v in zip(self.fields, obj)) 540 else: --> 541 raise ValueError("Unexpected tuple %r with StructType" % obj) 542 else: 543 if isinstance(obj, dict): ValueError: Unexpected tuple LabeledPoint(1.0, (5,[0,1],[2.0,21.0])) with StructType {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org