Kaushal Prajapati created SPARK-21316:
-----------------------------------------

             Summary: Dataset Union output is not consistent with the column 
sequence
                 Key: SPARK-21316
                 URL: https://issues.apache.org/jira/browse/SPARK-21316
             Project: Spark
          Issue Type: Bug
          Components: Optimizer, SQL
    Affects Versions: 2.1.0
            Reporter: Kaushal Prajapati
            Priority: Critical


if i take union of 2 datasets with similar schema, the output should remain 
same even if i change the sequence of columns while creating the dataset. 

i am attaching the code snippet for details.

{code:java}
public class Person{
  public String name;
  public String age;

  public Person(String name, String age) {
    this.name = name;
    this.age = age;
  }

  public String getName() {return name;}
  public void setName(String name) {this.name = name;}
  public String getAge() {return age;}
  public void setAge(String age) {this.age = age;}
}
{code}


{code:java}
public class Test {
  public static void main(String arg[]) throws Exception {
    SparkSession spark = SparkConnection.getSpark();

    List<Person> list1 = new ArrayList<>();
    list1.add(new Person("kaushal", "25"));
    list1.add(new Person("aman", "26"));

    List<Person> list2 = new ArrayList<>();
    list2.add(new Person("sapan", "25"));
    list2.add(new Person("yati", "26"));

    Dataset<Person> ds1 = spark.createDataset(list1, 
Encoders.bean(Person.class));
    Dataset<Person> ds2 = spark.createDataset(list2, 
Encoders.bean(Person.class));
    ds1.show();
    ds2.show();
    ds1.select("name","age").as(Encoders.bean(Person.class)).union(ds2).show();
  }
}
{code}

output :-

{code:java}
+---+-------+
|age|   name|
+---+-------+
| 25|kaushal|
| 26|   aman|
+---+-------+

+---+-----+
|age| name|
+---+-----+
| 25|sapan|
| 26| yati|
+---+-----+

+-------+-----+
|   name|  age|
+-------+-----+
|kaushal|   25|
|   aman|   26|
|     25|sapan|
|     26| yati|
+-------+-----+
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to