Carlos Bribiescas created SPARK-22335:
-----------------------------------------

             Summary: Union for DataSet uses column order instead of types for 
union
                 Key: SPARK-22335
                 URL: https://issues.apache.org/jira/browse/SPARK-22335
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Carlos Bribiescas
            Priority: Minor


This isn't quite the issue I'm facing, but solving this issue will fix my 
issue. (probably)
I see union uses column order for a DF. This to me is "fine" since they aren't 
typed.
However, for a dataset which is supposed to be strongly typed it is actually 
giving the wrong result. If you try to access the members by name, it will use 
the order. Heres is a reproducible case. 2.2.0

{code:java}

  case class AB(a : String, b : String)

  val abDf = sc.parallelize(List(("aThing","bThing"))).toDF("a", "b")
  val baDf = sc.parallelize(List(("bThing","aThing"))).toDF("b", "a")
  
  abDf.union(baDf).show() // as this ticket states, its "Not a problem"
  
  val abDs = abDf.as[AB]
  val baDs = baDf.as[AB]
  
  abDs.union(baDs).show()
  
  abDs.union(baDs).map(_.a).show() // this gives wrong result since a 
Dataset[AB] should be correctly mapped by type, not by column order

   abDs.union(baDs).rdd.take(2) // This also gives wrong result

  baDs.map(_.a).show() // However, this gives the correct result, even though 
columns were out of order.
  abDs.map(_.a).show() // This is correct too
{code}

So its inconsistent and a bug IMO.  

I imagine its just lazily converting to typed DS instead of initially.  So 
either that could be prioritized or unioning of DF could be done with column 
order taken into account.  Again, this is speculation..



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to