Arttu Voutilainen created SPARK-26810:
-----------------------------------------
Summary: Fixing SPARK-25072 broke existing code and fails to show
error message
Key: SPARK-26810
URL: https://issues.apache.org/jira/browse/SPARK-26810
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.4.0
Reporter: Arttu Voutilainen
Hey,
We upgraded Spark recently, and
https://issues.apache.org/jira/browse/SPARK-25072 caused our pipeline to fail
after the upgrade. Annoyingly, the error message formatting also threw an
exception itself, thus hiding the message we should have seen.
Repro using gettyimages/docker-spark, on 2.4.0:
{code}
from pyspark.sql import Row
r = Row(['a','b'])
r('1', '2')
{code}
{code}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1505, in __call__
"but got %s" % (self, len(self), args))
File "/usr/spark-2.4.0/python/pyspark/sql/types.py", line 1552, in __repr__
return "<Row(%s)>" % ", ".join(self)
TypeError: sequence item 0: expected str instance, list found
{code}
On 2.3.1, and also showing how this was used:
{code}
from pyspark.sql import Row, types as T
r = Row(['a','b'])
df = spark.createDataFrame([Row(col='doesntmatter')])
rdd = df.rdd.mapPartitions(lambda p: [r('a1','b2')])
spark.createDataFrame(rdd, T.StructType([T.StructField('a', T.StringType()),
T.StructField('b', T.StringType())])).collect()
{code}
{code}
[Row(a='a1', b='b2'), Row(a='a1', b='b2')]
{code}
While I do think the code we had was quite horrible, it used to work. The
unexpected error came from __repr__ as it assumes that the arguments given to
Row constructor are strings. That sounds like a reasonable assumption, should
the Row constructor validate that it holds true maybe? (I guess that might be
another potentially breaking change though, if someone has as weird code as
this one...)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]