[ https://issues.apache.org/jira/browse/SPARK-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715639#comment-14715639 ]
Apache Spark commented on SPARK-10305: -------------------------------------- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/8470 > PySpark createDataFrame on list of LabeledPoints fails (regression) > ------------------------------------------------------------------- > > Key: SPARK-10305 > URL: https://issues.apache.org/jira/browse/SPARK-10305 > Project: Spark > Issue Type: Bug > Components: ML, PySpark, SQL > Affects Versions: 1.5.0 > Reporter: Joseph K. Bradley > Priority: Critical > > The following code works in 1.4 but fails in 1.5: > {code} > import numpy as np > from pyspark.mllib.regression import LabeledPoint > from pyspark.mllib.linalg import Vectors > lp1 = LabeledPoint(1.0, Vectors.sparse(5, np.array([0, 1]), np.array([2.0, > 21.0]))) > lp2 = LabeledPoint(0.0, Vectors.sparse(5, np.array([2, 3]), np.array([2.0, > 21.0]))) > tmp = [lp1, lp2] > sqlContext.createDataFrame(tmp).show() > {code} > The failure is: > {code} > ValueError: Unexpected tuple LabeledPoint(1.0, (5,[0,1],[2.0,21.0])) with > StructType > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > <ipython-input-1-0e7cb8772e10> in <module>() > 6 lp2 = LabeledPoint(0.0, Vectors.sparse(5, np.array([2, 3]), > np.array([2.0, 21.0]))) > 7 tmp = [lp1, lp2] > ----> 8 sqlContext.createDataFrame(tmp).show() > /home/ubuntu/databricks/spark/python/pyspark/sql/context.pyc in > createDataFrame(self, data, schema, samplingRatio) > 404 rdd, schema = self._createFromRDD(data, schema, > samplingRatio) > 405 else: > --> 406 rdd, schema = self._createFromLocal(data, schema) > 407 jrdd = > self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd()) > 408 jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), > schema.json()) > /home/ubuntu/databricks/spark/python/pyspark/sql/context.pyc in > _createFromLocal(self, data, schema) > 335 > 336 # convert python objects to sql data > --> 337 data = [schema.toInternal(row) for row in data] > 338 return self._sc.parallelize(data), schema > 339 > /home/ubuntu/databricks/spark/python/pyspark/sql/types.pyc in > toInternal(self, obj) > 539 return tuple(f.toInternal(v) for f, v in > zip(self.fields, obj)) > 540 else: > --> 541 raise ValueError("Unexpected tuple %r with > StructType" % obj) > 542 else: > 543 if isinstance(obj, dict): > ValueError: Unexpected tuple LabeledPoint(1.0, (5,[0,1],[2.0,21.0])) with > StructType > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org