Michael Armbrust created SPARK-5898:
---------------------------------------
Summary: Can't create DataFrame from Pandas data frame
Key: SPARK-5898
URL: https://issues.apache.org/jira/browse/SPARK-5898
Project: Spark
Issue Type: Bug
Components: SQL
Reporter: Michael Armbrust
Assignee: Davies Liu
Priority: Critical
{code}
data = sqlContext.table("sparkCommits")
p = data.toPandas()
sqlContext.createDataFrame(p)
{code}
{code}
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-fb4f1895bd2f> in <module>()
1 data = sqlContext.table("sparkCommits")
2 p = data.toPandas()
----> 3 sqlContext.createDataFrame(p)
/home/ubuntu/databricks/spark/python/pyspark/sql/context.pyc in
createDataFrame(self, data, schema, samplingRatio)
385 data = self._sc.parallelize(data.to_records(index=False))
386 if schema is None:
--> 387 schema = list(data.columns)
388
389 if not isinstance(data, RDD):
AttributeError: 'RDD' object has no attribute 'columns'
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]