[ https://issues.apache.org/jira/browse/SPARK-23299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361966#comment-16361966 ]
Shashwat Anand commented on SPARK-23299: ---------------------------------------- [~hyukjin.kwon] What do we do about this ? > __repr__ broken for Rows instantiated with *args > ------------------------------------------------ > > Key: SPARK-23299 > URL: https://issues.apache.org/jira/browse/SPARK-23299 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.5.0, 2.2.0 > Environment: Tested on OS X with Spark 1.5.0 as well as pip-installed > `pyspark` 2.2.0. Code in question appears to still be in error on the master > branch of the GitHub repository. > Reporter: Oli Hall > Priority: Minor > > PySpark Rows throw an exception if instantiated without column names when > `__repr__` is called. The most minimal reproducible example I've found is > this: > {code:java} > > from pyspark.sql.types import Row > > Row(123) > <stack-trace snipped for brevity> > <v-env location>/lib/python2.7/site-packages/pyspark/sql/types.pyc in > __repr__(self) > -> 1524 return "<Row(%s)>" % ", ".join(self) > TypeError: sequence item 0: expected string, int found{code} > This appears to be due to the implementation of `__repr__`, which works > excellently for Rows created with column names, but for those without, > assumes all values are strings ([link > here|https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1584]). > This should be an easy fix, if the values are mapped to `str` first, all > should be well (last line is the only modification): > {code:java} > def __repr__(self): > """Printable representation of Row used in Python REPL.""" > if hasattr(self, "__fields__"): > return "Row(%s)" % ", ".join("%s=%r" % (k, v) > for k, v in zip(self.__fields__, > tuple(self))) > else: > "<Row(%s)>" % ", ".join(map(str, self)) > {code} > This will yield the following: > {code:java} > > from pyspark.sql.types import Row > > Row('aaa', 123) > <Row(aaaa, 123)> > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org