Rahul Challapalli created DRILL-1897:
----------------------------------------

             Summary: Using 'avg' along with 'length' results in wrong output 
when nulls are present
                 Key: DRILL-1897
                 URL: https://issues.apache.org/jira/browse/DRILL-1897
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
            Reporter: Rahul Challapalli


git.commit.id.abbrev=9dfa4a1

Dataset :
{code}
{
 "col1":1,
 "col2":"abc"
}
{
 "col1":1,
 "col2":null
}
{
 "col1":null,
 "col2":null
}
{code}

The below query should return 3 instead of 1 (postgres correctly returns 3).
Query :
{code}
select avg(length(col2)) from `a.json`;
+------------+
|   EXPR$0   |
+------------+
| 1.0        |
+------------+
{code} 

Plan for the above query :
{code}
00-00    Screen
00-01      Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY])
00-02        StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()])
00-03          Project($f0=[length($0)])
00-04            Scan(groupscan=[EasyGroupScan 
[selectionRoot=/drill/testdata/data-shapes/wide-columns/flat/json/a.json, 
numFiles=1, columns=[`col2`], 
files=[maprfs:/drill/testdata/data-shapes/wide-columns/flat/json/a.json]]])
{code}

The below query which does not involve a second function works as expected even 
though nulls are present
{code}
 select avg(col1) from `a.json`;
+------------+
|   EXPR$0   |
+------------+
| 1.0        |
+------------+
{code}

Let me know if you need any more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to