-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/
-----------------------------------------------------------
(Updated July 11, 2016, 8:37 a.m.)
Review request for hive, Ashutosh Chauhan and Carl Steinbach.
Repository: hive-git
Description
-------
Problem Statement:
When we are working with complex structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each
tuple have struct schema.
Suppose here struct schema is like below:
{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}
Then while running our hive query complex array looks like array of employee
objects.
Example:
//(array<struct<empId,empName,age,salary>>)
Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
When we are implementing business use cases day to day life we are encountering
problems like sorting a tuple array by specific field[s] like
empId,name,salary,etc by ASC or DESC order.
Proposal:
I have developed a udf 'sort_array_by' which will sort a tuple array by one or
more fields in ASC or DESC order provided by user ,default is ascending order .
Example:
1.Select
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
output:
array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
2.Select
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
output:
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
3.Select
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
output:
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
Diffs (updated)
-----
itests/src/test/resources/testconfiguration.properties 1ab914d
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java
PRE-CREATION
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java
PRE-CREATION
ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q PRE-CREATION
ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q PRE-CREATION
ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q PRE-CREATION
ql/src/test/queries/clientpositive/udf_sort_array_by.q PRE-CREATION
ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40
ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out
PRE-CREATION
ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out
PRE-CREATION
ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out
PRE-CREATION
ql/src/test/results/clientpositive/show_functions.q.out a811747
ql/src/test/results/clientpositive/udf_sort_array_by.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/49619/diff/
Testing
-------
Junit test cases and query.q files are attached
Thanks,
Simanchal Das