[ https://issues.apache.org/jira/browse/SPARK-23009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318990#comment-16318990 ]
Bryan Cutler commented on SPARK-23009: -------------------------------------- I can put in a fix for this > PySpark should not assume Pandas cols are a basestring type > ----------------------------------------------------------- > > Key: SPARK-23009 > URL: https://issues.apache.org/jira/browse/SPARK-23009 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0 > Reporter: Bryan Cutler > > When calling {{SparkSession.createDataFrame}} using a Pandas DataFrame as > input, Spark assumes that the columns will either be a {{str}} type or > {{unicode}} type. They can actually be any type that a dict can key off of. > If they are not a {{basestr}} type, then a confusing AttributeError is thrown: > {{code}} > In [16]: pdf = pd.DataFrame(np.random.rand(4, 2)) > In [17]: pdf > Out[17]: > 0 1 > 0 0.145171 0.482940 > 1 0.151336 0.299861 > 2 0.220338 0.830133 > 3 0.001659 0.513787 > In [18]: pdf.columns > Out[18]: RangeIndex(start=0, stop=2, step=1) > In [19]: df = spark.createDataFrame(pdf) > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call last) > <ipython-input-18-11bcb07e0e39> in <module>() > ----> 1 df = spark.createDataFrame(pdf) > /home/bryan/git/spark/python/pyspark/sql/session.pyc in createDataFrame(self, > data, schema, samplingRatio, verifySchema) > 646 # If no schema supplied by user then get the names of > columns only > 647 if schema is None: > --> 648 schema = [x.encode('utf-8') if not isinstance(x, str) > else x for x in data.columns] > 649 > 650 if self.conf.get("spark.sql.execution.arrow.enabled", > "false").lower() == "true" \ > AttributeError: 'int' object has no attribute 'encode' > {{code}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org