Tomasz Bartczak created SPARK-23477:
---------------------------------------
Summary: Misleading exception message when union fails due to
metadata
Key: SPARK-23477
URL: https://issues.apache.org/jira/browse/SPARK-23477
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.2.1
Reporter: Tomasz Bartczak
When I have two DF's that are different only in terms of metadata in fields
inside a struct - I cannot union them but the error message shows that they are
the same:
{code:java}
df = spark.createDataFrame([{'a':1}])
a = df.select(struct('a').alias('x'))
b =
df.select(col('a').alias('a',metadata={'description':'xxx'})).select(struct(col('a')).alias('x'))
a.union(b).printSchema(){code}
gives:
{code:java}
An error occurred while calling o1076.union.
: org.apache.spark.sql.AnalysisException: Union can only be performed on tables
with the compatible column types. struct<a:bigint> <> struct<a:bigint> at the
first column of the second table{code}
and this part:
{code:java}
struct<a:bigint> <> struct<a:bigint>{code}
does not make any sense because those are the same.
Since metadata must be the same for union -> it should be incuded in the error
message
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]