Simanchal Das created HIVE-14159:
------------------------------------
Summary: sorting of tuple array using multiple field[s]
Key: HIVE-14159
URL: https://issues.apache.org/jira/browse/HIVE-14159
Project: Hive
Issue Type: Improvement
Components: UDF
Reporter: Simanchal Das
Assignee: Simanchal Das
Problem Statement:
When we are working with complex structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each
tuple have struct schema.
Suppose here struct schema is like below:
{noformat}
{
"name": "employee",
"type": [{
"type": "record",
"name": "Employee",
"namespace": "com.company.Employee",
"fields": [{
"name": "empId",
"type": "int"
}, {
"name": "empName",
"type": "string"
}, {
"name": "age",
"type": "int"
}, {
"name": "salary",
"type": "double"
}]
}]
}
{noformat}
Then while running our hive query complex array looks like array of employee
objects.
{noformat}
Example:
//(array<struct<empId,empName,age,salary>>)
Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
{noformat}
When we are implementing business use cases day to day life we are encountering
problems like sorting a tuple array by specific field[s] like
empId,name,salary,etc.
Proposal:
I have developed a udf 'sort_array_field' which will sort a tuple array by one
or more fields in naural order.
{noformat}
Example:
1.Select
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
output:
array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
2.Select
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
output:
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
3.Select
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
output:
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)