[ https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mehant Baid updated DRILL-2309: ------------------------------- Attachment: DRILL-2309.patch Minor update to the patch. > 'null' is counted with subquery > ------------------------------- > > Key: DRILL-2309 > URL: https://issues.apache.org/jira/browse/DRILL-2309 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Affects Versions: 0.8.0 > Reporter: Chun Chang > Assignee: Mehant Baid > Priority: Critical > Fix For: 0.9.0 > > Attachments: DRILL-2309.patch > > > #Thu Feb 19 18:40:10 EST 2015 > git.commit.id.abbrev=1ceddff > The following query returns correct count involving columns that contains > null value. > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select tt.gbyi, count(tt.nul) > from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by > tt.gbyi order by tt.gbyi; > +------------+------------+ > | gbyi | EXPR$1 | > +------------+------------+ > | 0 | 33580 | > | 1 | 33317 | > | 2 | 33438 | > | 3 | 33535 | > | 4 | 33369 | > | 5 | 32990 | > | 6 | 33661 | > | 7 | 33130 | > | 8 | 33362 | > | 9 | 33364 | > | 10 | 33229 | > | 11 | 33567 | > | 12 | 33379 | > | 13 | 33045 | > | 14 | 33305 | > +------------+------------+ > {code} > But if you add more aggregation to the query, the returned count is wrong > (pay attention to the last column). > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select tt.gbyi, sum(tt.id), > avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from > `complex.json` t) tt group by tt.gbyi order by tt.gbyi; > +------------+------------+------------+------------+ > | gbyi | EXPR$1 | EXPR$2 | EXPR$3 | > +------------+------------+------------+------------+ > | 0 | 33445554017 | 499613.0956877819 | 66943 | > | 1 | 33209358334 | 500760.0252919893 | 66318 | > | 2 | 33369118041 | 498091.82200273 | 66994 | > | 3 | 33254533860 | 498696.5063226428 | 66683 | > | 4 | 33393965595 | 501125.64656145993 | 66638 | > | 5 | 33216885506 | 499961.32710397616 | 66439 | > | 6 | 33380205950 | 498875.3923256599 | 66911 | > | 7 | 33405849390 | 501093.43067788356 | 66666 | > | 8 | 33136951190 | 498458.1044031481 | 66479 | > | 9 | 33319291474 | 499967.5392457864 | 66643 | > | 10 | 33339388887 | 499190.47462408233 | 66787 | > | 11 | 33571590550 | 502095.86682194035 | 66863 | > | 12 | 33437342090 | 501708.8141502653 | 66647 | > | 13 | 33071800925 | 498896.453904129 | 66290 | > | 14 | 33448664191 | 501487.4206955959 | 66699 | > +------------+------------+------------+------------+ > [code} > plan for the query returned the wrong result: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select > tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, > t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) > 00-02 SingleMergeExchange(sort0=[0 ASC]) > 01-01 SelectionVectorRemover > 01-02 Sort(sort0=[$0], dir0=[ASC]) > 01-03 Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], > EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5]) > 01-04 HashAgg(group=[{0}], agg#0=[$SUM0($1)], > agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)]) > 01-05 HashToRandomExchange(dist0=[[$0]]) > 02-01 HashAgg(group=[{0}], agg#0=[$SUM0($1)], > agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()]) > 02-02 Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0]) > 02-03 Scan(groupscan=[EasyGroupScan > [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, > columns=[`gbyi`, `id`, `fl`, `nul`], > files=[maprfs:/drill/testdata/complex_type/json/complex.json]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)