Kaushal Prajapati created SPARK-21316:
-----------------------------------------
Summary: Dataset Union output is not consistent with the column
sequence
Key: SPARK-21316
URL: https://issues.apache.org/jira/browse/SPARK-21316
Project: Spark
Issue Type: Bug
Components: Optimizer, SQL
Affects Versions: 2.1.0
Reporter: Kaushal Prajapati
Priority: Critical
if i take union of 2 datasets with similar schema, the output should remain
same even if i change the sequence of columns while creating the dataset.
i am attaching the code snippet for details.
{code:java}
public class Person{
public String name;
public String age;
public Person(String name, String age) {
this.name = name;
this.age = age;
}
public String getName() {return name;}
public void setName(String name) {this.name = name;}
public String getAge() {return age;}
public void setAge(String age) {this.age = age;}
}
{code}
{code:java}
public class Test {
public static void main(String arg[]) throws Exception {
SparkSession spark = SparkConnection.getSpark();
List<Person> list1 = new ArrayList<>();
list1.add(new Person("kaushal", "25"));
list1.add(new Person("aman", "26"));
List<Person> list2 = new ArrayList<>();
list2.add(new Person("sapan", "25"));
list2.add(new Person("yati", "26"));
Dataset<Person> ds1 = spark.createDataset(list1,
Encoders.bean(Person.class));
Dataset<Person> ds2 = spark.createDataset(list2,
Encoders.bean(Person.class));
ds1.show();
ds2.show();
ds1.select("name","age").as(Encoders.bean(Person.class)).union(ds2).show();
}
}
{code}
output :-
{code:java}
+---+-------+
|age| name|
+---+-------+
| 25|kaushal|
| 26| aman|
+---+-------+
+---+-----+
|age| name|
+---+-----+
| 25|sapan|
| 26| yati|
+---+-----+
+-------+-----+
| name| age|
+-------+-----+
|kaushal| 25|
| aman| 26|
| 25|sapan|
| 26| yati|
+-------+-----+
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]