[ 
https://issues.apache.org/jira/browse/DRILL-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606373#comment-14606373
 ] 

Jinfeng Ni commented on DRILL-3414:
-----------------------------------

My preliminary analysis seems to indicate that Drill produce correct result, 
while Postgre does not.

The difference between q2.res (postgres output) and q2.out(drill output) start 
at line 1945 - 1947, and the rest of remaining output.

Line 1935 - 1947
Drill:

{code}
true  1960-01-05  488139464 1464418392
true  1960-01-05  488139464 1464418392
true  1960-01-05  488139464 1464418392
true  1960-01-15  471050029 2877568479
true  1960-01-15  471050029 2877568479
true  1960-01-15  471050029 2877568479
true  1960-01-22  459201109 3795970697
true  1960-01-22  459201109 3795970697
true  1960-02-05  439657915 4675286527
true  1960-02-05  439657915 4675286527
true  1960-02-17  429666606 5964286345
true  1960-02-17  429666606 5964286345
true  1960-02-17  429666606 5964286345
{code}

Postgres:
{code}
true  1960-01-05  488139464 1464418392
true  1960-01-05  488139464 1464418392
true  1960-01-05  488139464 1464418392
true  1960-01-15  471050029 2877568479
true  1960-01-15  471050029 2877568479
true  1960-01-15  471050029 2877568479
true  1960-01-22  459201109 3795970697
true  1960-01-22  459201109 3795970697
true  1960-02-05  439657915 4675286527
true  1960-02-05  439657915 4675286527
null  1960-02-17  429666606 1288999818
null  1960-02-17  429666606 1288999818
null  1960-02-17  429666606 1288999818
{code}

We can see for c_integer = 429666606, Drill has c_boolean = true, and sum() as 
5964286345. On postgres, for  c_integer = 429666606,  c_boolean becomes null, 
and sum() becomes 1288999818.

Now, let's look at the data:

{code}
select c_boolean, c_date, c_integer  from j4 where c_integer = 429666606;
+------------+-------------+------------+
| c_boolean  |   c_date    | c_integer  |
+------------+-------------+------------+
| true       | 1960-02-17  | 429666606  |
+------------+-------------+------------+

select c_integer from j1 where c_integer= 429666606;
+------------+
| c_integer  |
+------------+
| 429666606  |
| 429666606  |
| 429666606  |
+------------+=
{code}

Apparently, both table j1 and j4 has c_integer = 429666606, and the left join 
does not produce null value from right side. Therefore, c_boolean should be 
'true', and since window function use c_boolean as partition by column, those 3 
rows from the join operator should belong to the group of 'true'. That means 
sum() for those three rows is 4675286527 + 429666606 * 3 = 5964286345.

In contrast, Postgres seems to put c_boolean as null, and treat the 3 rows as a 
different group, which seems to be not right.




> Window function on a null producing column of an outer join results in the 
> wrong result
> ---------------------------------------------------------------------------------------
>
>                 Key: DRILL-3414
>                 URL: https://issues.apache.org/jira/browse/DRILL-3414
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>            Priority: Critical
>              Labels: window_funcion
>         Attachments: j1.tar, j4.parquet, query.tar
>
>
> {code:sql}
> select
>         j4.c_boolean,
>         j4.c_date,
>         j4.c_integer,
>         sum(j4.c_integer) over (partition by j4.c_boolean order by j4.c_date, 
> j4.c_integer)
> from    j1
>            left outer join
>            j4 on j1.c_integer = j4.c_integer
> order by 1,2,3;
> {code}
> If window function is on left side, query returns correct result.
> This works:
> {code:sql}
> select
>         j1.c_boolean,
>         j1.c_date,
>         sum(j1.c_integer) over (partition by j1.c_boolean order by j1.c_date)
> from
>         j1
>                 left outer join
>         j4 on j1.c_integer = j4.c_integer
> order by
>         1, 2;
> {code}
> Attached:
> 1. query.tar (q2.sql , q2.res (postgres output), q2.out (drill output) )
> 2. tables : j1.tar, j4.parquet



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to