[ 
https://issues.apache.org/jira/browse/DRILL-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Vysotskyi updated DRILL-8381:
----------------------------------
    Description: 
Currently, Drill ignores filters for filtered aggregate calls and returns 
incorrect results.
Here is the example query for which Drill will return incorrect results:
{code:sql}
SELECT count(n_name) FILTER(WHERE n_regionkey = 1) AS nations_count_in_1_region,
count(n_name) FILTER(WHERE n_regionkey = 2) AS nations_count_in_2_region,
count(n_name) FILTER(WHERE n_regionkey = 3) AS nations_count_in_3_region,
count(n_name) FILTER(WHERE n_regionkey = 4) AS nations_count_in_4_region,
count(n_name) FILTER(WHERE n_regionkey = 0) AS nations_count_in_0_region
FROM cp.`tpch/nation.parquet`
{code}
{noformat}
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
| nations_count_in_1_region | nations_count_in_2_region | 
nations_count_in_3_region | nations_count_in_4_region | 
nations_count_in_0_region |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
| 25                        | 25                        | 25                    
    | 25                        | 25                        |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
{noformat}
But the correct result is
{noformat}
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
| nations_count_in_1_region | nations_count_in_2_region | 
nations_count_in_3_region | nations_count_in_4_region | 
nations_count_in_0_region |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
| 5                         | 5                         | 5                     
    | 5                         | 5                         |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
{noformat}
Side note:
The query above could be rewritten using PIVOT:
{code:sql}
SELECT `1` nations_count_in_1_region, `2` nations_count_in_2_region, `3` 
nations_count_in_3_region, `4` nations_count_in_4_region, `0` 
nations_count_in_0_region
FROM (SELECT n_name, n_regionkey FROM cp.`tpch/nation.parquet`) 
PIVOT(count(n_name) FOR n_regionkey IN (0, 1, 2, 3, 4))
{code}
And will return correct results when this issue is fixed and Calcite is updated 
to 1.33.0

  was:
Currently, Drill ignores filters for filtered aggregate calls and returns 
incorrect results.
Here is the example query for which Drill will return incorrect results:
{code:sql}
SELECT count(n_name) FILTER(WHERE n_regionkey = 1) AS nations_count_in_1_region,
count(n_name) FILTER(WHERE n_regionkey = 2) AS nations_count_in_2_region,
count(n_name) FILTER(WHERE n_regionkey = 3) AS nations_count_in_3_region,
count(n_name) FILTER(WHERE n_regionkey = 4) AS nations_count_in_4_region,
count(n_name) FILTER(WHERE n_regionkey = 0) AS nations_count_in_0_region
FROM cp.`tpch/nation.parquet`
{code}

{noformat}
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
| nations_count_in_1_region | nations_count_in_2_region | 
nations_count_in_3_region | nations_count_in_4_region | 
nations_count_in_0_region |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
| 25                        | 25                        | 25                    
    | 25                        | 25                        |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
{noformat}

But correct result is
{noformat}
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
| nations_count_in_1_region | nations_count_in_2_region | 
nations_count_in_3_region | nations_count_in_4_region | 
nations_count_in_0_region |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
| 5                         | 5                         | 5                     
    | 5                         | 5                         |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
{noformat}


> Add support for filtered aggregate calls
> ----------------------------------------
>
>                 Key: DRILL-8381
>                 URL: https://issues.apache.org/jira/browse/DRILL-8381
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Vova Vysotskyi
>            Assignee: Vova Vysotskyi
>            Priority: Major
>
> Currently, Drill ignores filters for filtered aggregate calls and returns 
> incorrect results.
> Here is the example query for which Drill will return incorrect results:
> {code:sql}
> SELECT count(n_name) FILTER(WHERE n_regionkey = 1) AS 
> nations_count_in_1_region,
> count(n_name) FILTER(WHERE n_regionkey = 2) AS nations_count_in_2_region,
> count(n_name) FILTER(WHERE n_regionkey = 3) AS nations_count_in_3_region,
> count(n_name) FILTER(WHERE n_regionkey = 4) AS nations_count_in_4_region,
> count(n_name) FILTER(WHERE n_regionkey = 0) AS nations_count_in_0_region
> FROM cp.`tpch/nation.parquet`
> {code}
> {noformat}
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> | nations_count_in_1_region | nations_count_in_2_region | 
> nations_count_in_3_region | nations_count_in_4_region | 
> nations_count_in_0_region |
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> | 25                        | 25                        | 25                  
>       | 25                        | 25                        |
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> {noformat}
> But the correct result is
> {noformat}
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> | nations_count_in_1_region | nations_count_in_2_region | 
> nations_count_in_3_region | nations_count_in_4_region | 
> nations_count_in_0_region |
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> | 5                         | 5                         | 5                   
>       | 5                         | 5                         |
> +---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
> {noformat}
> Side note:
> The query above could be rewritten using PIVOT:
> {code:sql}
> SELECT `1` nations_count_in_1_region, `2` nations_count_in_2_region, `3` 
> nations_count_in_3_region, `4` nations_count_in_4_region, `0` 
> nations_count_in_0_region
> FROM (SELECT n_name, n_regionkey FROM cp.`tpch/nation.parquet`) 
> PIVOT(count(n_name) FOR n_regionkey IN (0, 1, 2, 3, 4))
> {code}
> And will return correct results when this issue is fixed and Calcite is 
> updated to 1.33.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to