Hao Ren created SPARK-27855:
-------------------------------

             Summary: Union failed between 2 datasets of the same type 
converted from different dataframes
                 Key: SPARK-27855
                 URL: https://issues.apache.org/jira/browse/SPARK-27855
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.3
            Reporter: Hao Ren


2 Datasets of the same type converted from different dataframes can not union.

Here is the code to reproduce the problem. It seems `union` just checks the 
schema of the orignal dataframe, even if the two datasets have already been 
converted to the same type of dataset.
{code:java}
case class Entity(key: Int, a: Int, b: String)
val df1 = Seq((2,2,"2")).toDF("key", "a", "b").as[Entity]
val df2 = Seq((1,"1",1)).toDF("key", "b", "a").as[Entity]
df1.printSchema
df2.printSchema
df1 union df2
{code}
Result
{code:java}
defined class Entity df1: org.apache.spark.sql.Dataset[Entity] = [key: int, a: 
int ... 1 more field] df2: org.apache.spark.sql.Dataset[Entity] = [key: int, b: 
string ... 1 more field] converted root |-- key: integer (nullable = false) |-- 
a: integer (nullable = false) |-- b: string (nullable = true) root |-- key: 
integer (nullable = false) |-- b: string (nullable = true) |-- a: integer 
(nullable = false) org.apache.spark.sql.AnalysisException: Cannot up cast `a` 
from string to int as it may truncate The type path of the target object is: - 
field (class: "scala.Int", name: "a") - root class: "Entity" You can either add 
an expl
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to