Rahul Challapalli created DRILL-2264:
----------------------------------------
Summary: Incorrect data when we use aggregate functions with
flatten
Key: DRILL-2264
URL: https://issues.apache.org/jira/browse/DRILL-2264
Project: Apache Drill
Issue Type: Bug
Components: Functions - Drill
Reporter: Rahul Challapalli
Assignee: Jason Altekruse
Priority: Critical
git.commit.id.abbrev=6676f2d
Data Set :
{code}
{
"uid":1,
"lst_lst" : [[1,2],[3,4]]
}
{
"uid":2,
"lst_lst" : [[1,2],[3,4]]
}
{code}
The below query returns incorrect results :
{code}
select uid,MAX( flatten(lst_lst[1]) + flatten(lst_lst[0])) from `temp.json`
group by uid, flatten(lst_lst[1]), flatten(lst_lst[0]);
+------------+------------+
| uid | EXPR$1 |
+------------+------------+
| 1 | 6 |
| 1 | 6 |
| 1 | 6 |
| 1 | 6 |
| 2 | 6 |
| 2 | 6 |
| 2 | 6 |
| 2 | 6 |
+------------+------------+
{code}
However if we use a sub query, drill returns the right data
{code}
select uid, MAX(l1+l2) from (select uid,flatten(lst_lst[1]) l1,
flatten(lst_lst[0]) l2 from `temp.json`) sub group by uid, l1, l2;
+------------+------------+
| uid | EXPR$1 |
+------------+------------+
| 1 | 4 |
| 1 | 5 |
| 1 | 5 |
| 1 | 6 |
| 2 | 4 |
| 2 | 5 |
| 2 | 5 |
| 2 | 6 |
+------------+------------+
{code}
Also using a single flatten yields proper results
{code}
select uid,MAX(flatten(lst_lst[0])) from `temp.json` group by uid,
flatten(lst_lst[0]);
+------------+------------+
| uid | EXPR$1 |
+------------+------------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
+------------+------------+
{code}
Marked it as critical since we return in-correct data. Let me know if you have
any other questions
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)