Rahul Challapalli created DRILL-1897:
----------------------------------------
Summary: Using 'avg' along with 'length' results in wrong output
when nulls are present
Key: DRILL-1897
URL: https://issues.apache.org/jira/browse/DRILL-1897
Project: Apache Drill
Issue Type: Bug
Components: Functions - Drill
Reporter: Rahul Challapalli
git.commit.id.abbrev=9dfa4a1
Dataset :
{code}
{
"col1":1,
"col2":"abc"
}
{
"col1":1,
"col2":null
}
{
"col1":null,
"col2":null
}
{code}
The below query should return 3 instead of 1 (postgres correctly returns 3).
Query :
{code}
select avg(length(col2)) from `a.json`;
+------------+
| EXPR$0 |
+------------+
| 1.0 |
+------------+
{code}
Plan for the above query :
{code}
00-00 Screen
00-01 Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY])
00-02 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()])
00-03 Project($f0=[length($0)])
00-04 Scan(groupscan=[EasyGroupScan
[selectionRoot=/drill/testdata/data-shapes/wide-columns/flat/json/a.json,
numFiles=1, columns=[`col2`],
files=[maprfs:/drill/testdata/data-shapes/wide-columns/flat/json/a.json]]])
{code}
The below query which does not involve a second function works as expected even
though nulls are present
{code}
select avg(col1) from `a.json`;
+------------+
| EXPR$0 |
+------------+
| 1.0 |
+------------+
{code}
Let me know if you need any more information.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)