[ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-2488:
------------------------------
    Attachment: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch

It turns out to be an issue with supported encoding for MergeJoin.  Merge Join 
execution operator currently does not process incoming batches with SV2 or SV4, 
so if there was a Limit or Sort below, we need to insert a 
SelectionVectorRemover below the MJ.  
Uploaded a patch with a simple fix.  [~vkorukanti] could you please review ?  I 
haven't run all tests yet..still in process.

> Wrong result on join between two subqueries with aggregation
> ------------------------------------------------------------
>
>                 Key: DRILL-2488
>                 URL: https://issues.apache.org/jira/browse/DRILL-2488
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.9.0
>
>         Attachments: 
> 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.15 seconds)
> {code}
> This result is incorrect, one row is missing
> {code}
> 0: jdbc:drill:schema=dfs> select * from
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . >                 select
> . . . . . . . . . . . . >                         b1,
> . . . . . . . . . . . . >                         count(distinct a1)
> . . . . . . . . . . . . >                 from
> . . . . . . . . . . . . >                         t1
> . . . . . . . . . . . . >                 group by
> . . . . . . . . . . . . >                         b1
> . . . . . . . . . . . . >                 order by
> . . . . . . . . . . . . >                         b1 limit 5 offset 1
> . . . . . . . . . . . . >         ) as sq1(x1, y1)
> . . . . . . . . . . . . >
> . . . . . . . . . . . . >         inner join
> . . . . . . . . . . . . >
> . . . . . . . . . . . . >         (
> . . . . . . . . . . . . >                 select
> . . . . . . . . . . . . >                         b1,
> . . . . . . . . . . . . >                         count(distinct a1)
> . . . . . . . . . . . . >                 from
> . . . . . . . . . . . . >                         t1
> . . . . . . . . . . . . >                 group by
> . . . . . . . . . . . . >                         b1
> . . . . . . . . . . . . >                 order by
> . . . . . . . . . . . . >                         b1 limit 5 offset 1
> . . . . . . . . . . . . >         ) as sq2(x1, y1)
> . . . . . . . . . . . . >         on
> . . . . . . . . . . . . >                 sq1.x1 = sq2.x1 and
> . . . . . . . . . . . . >                 sq2.y1 = sq2.y1
> . . . . . . . . . . . . > ;
> +------------+------------+------------+------------+
> |     x1     |     y1     |    x10     |    y10     |
> +------------+------------+------------+------------+
> | bbbbb      | 1          | bbbbb      | 1          |
> | ccccc      | 1          | ccccc      | 1          |
> | eeeee      | 1          | eeeee      | 1          |
> | fffff      | 1          | fffff      | 1          |
> +------------+------------+------------+------------+
> 4 rows selected (0.28 seconds)
> {code}
> Explain plan for the wrong result:
> {code}
> 00-01      Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
> 00-02        Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
> 00-03          MergeJoin(condition=[=($0, $2)], joinType=[inner])
> 00-05            Limit(offset=[1], fetch=[5])
> 00-07              StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
> 00-09                Sort(sort0=[$0], dir0=[ASC])
> 00-11                  StreamAgg(group=[{0, 1}])
> 00-13                    Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-15                      Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
> selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
> `a1`]]])
> 00-04            Project(b10=[$0], EXPR$10=[$1])
> 00-06              SelectionVectorRemover
> 00-08                Sort(sort0=[$0], dir0=[ASC])
> 00-10                  Filter(condition=[=($1, $1)])
> 00-12                    Limit(offset=[1], fetch=[5])
> 00-14                      StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
> 00-16                        Sort(sort0=[$0], dir0=[ASC])
> 00-17                          StreamAgg(group=[{0, 1}])
> 00-18                            Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
> dir1=[ASC])
> 00-19                              Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
> selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
> `a1`]]])
> {code}
> If you turn off merge join, query returns correct result:
> {code}
> 0: jdbc:drill:schema=dfs> select * from
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . >     select
> . . . . . . . . . . . . >         b1,
> . . . . . . . . . . . . >         count(distinct a1)
> . . . . . . . . . . . . >     from
> . . . . . . . . . . . . >         t1
> . . . . . . . . . . . . >     group by
> . . . . . . . . . . . . >         b1
> . . . . . . . . . . . . >     order by
> . . . . . . . . . . . . >         b1 limit 5 offset 1
> . . . . . . . . . . . . >  ) as sq1(x1, y1)
> . . . . . . . . . . . . >
> . . . . . . . . . . . . >         inner join
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . >     select
> . . . . . . . . . . . . >         b1,
> . . . . . . . . . . . . >         count(distinct a1)
> . . . . . . . . . . . . >     from
> . . . . . . . . . . . . >         t1
> . . . . . . . . . . . . >     group by
> . . . . . . . . . . . . >         b1
> . . . . . . . . . . . . >     order by
> . . . . . . . . . . . . >           b1 limit 5 offset 1
> . . . . . . . . . . . . > ) as sq2(x1, y1)
> . . . . . . . . . . . . > on
> . . . . . . . . . . . . >                 sq1.x1 = sq2.x1 and
> . . . . . . . . . . . . >                 sq2.y1 = sq2.y1
> . . . . . . . . . . . . > ;
> +------------+------------+------------+------------+
> |     x1     |     y1     |    x10     |    y10     |
> +------------+------------+------------+------------+
> | bbbbb      | 1          | bbbbb      | 1          |
> | ccccc      | 1          | ccccc      | 1          |
> | eeeee      | 1          | eeeee      | 1          |
> | fffff      | 1          | fffff      | 1          |
> | ggggg      | 1          | ggggg      | 1          |
> +------------+------------+------------+------------+
> 5 rows selected (0.352 seconds)
> {code}
> cut/paste reproduction
> {code:sql}
> select * from
>  (
>                  select 
>                          b1,                      
>                          count(distinct a1)       
>                  from 
>                          t1                       
>                  group by                 
>                          b1                       
>                  order by                 
>                          b1 limit 5 offset 1      
>          ) as sq1(x1, y1)         
>   
>          inner join 
>   
>          (
>                  select 
>                          b1,                      
>                          count(distinct a1)       
>                  from 
>                          t1                       
>                  group by                 
>                          b1                       
>                  order by                 
>                          b1 limit 5 offset 1      
>          ) as sq2(x1, y1)         
>          on 
>                  sq1.x1 = sq2.x1 and      
>                  sq2.y1 = sq2.y1          
>  ;
> {code}
> This test has been running and passing in regression test suite until 
> framework was switched to a new code, where JSON parsing was replaced with 
> jackson  and for a brief period ( I believe Friday afternoon until now ) this 
> suite was not executed.
> We already have a merge join bug DRILL-2010, but this one seems to be 
> different (small data set) and feels like a recent regression. 
> For QA: test Functional/Passing/aggregation/sanity/q18.sql will be running 
> with merge join disabled until this issue is fixed. Will need to remove alter 
> session from this file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to