[jira] [Updated] (KYLIN-1467) subquery optimization: remove unused subquery output columns

chelubai (JIRA) Thu, 03 Mar 2016 09:02:57 -0800

     [ 
https://issues.apache.org/jira/browse/KYLIN-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chelubai updated KYLIN-1467:
----------------------------
    Description: 
Table 'tb' has more than 10,000,000 rows;
{code:sql}
-- sql-1:
select a from tb group by a;
{code}
The sql-1 result has only 3 rows:
v1
v2
v3

When query kylin through BI tool tableau, the generated sql:
-- sql-2:
SELECT "X___SQL___"."a" AS "a"
FROM (
select a,b,c,d
from tb
group by a,b,c,d  )
"X___SQL___"
GROUP BY "X___SQL___"."a"

the sql-1 and sql-2 is equivalent, but sql-2 fails with message:
 Scan row count exceeded threshold: 1000000, please add filter condition to 
narrow down backend scan range, like where clause.

one solution:
Add a calcite planner optimization rule to kylin-query, to  remove the columns 
of the subquery that is not used outside, both from select, group by and order 
by.

so the sql-2 may be optimized to:
SELECT "X___SQL___"."a" AS "a"
FROM (
select a
from tb
group by a )
"X___SQL___"
GROUP BY "X___SQL___"."a"

This sql run successfully.

Donot know if the exiting products like mysql/oracle/hive/presto/spark... have 
this optimization or not.
Any comment is appreciated.



  was:
Table 'tb' has more than 10,000,000 rows;
-- sql-1:
select a from tb group by a;

The sql-1 result has only 3 rows:
v1
v2
v3

When query kylin through BI tool tableau, the generated sql:
-- sql-2:
SELECT "X___SQL___"."a" AS "a"
FROM (
select a,b,c,d
from tb
group by a,b,c,d  )
"X___SQL___"
GROUP BY "X___SQL___"."a"

the sql-1 and sql-2 is equivalent, but sql-2 fails with message:
 Scan row count exceeded threshold: 1000000, please add filter condition to 
narrow down backend scan range, like where clause.

one solution:
Add a calcite planner optimization rule to kylin-query, to  remove the columns 
of the subquery that is not used outside, both from select, group by and order 
by.

so the sql-2 may be optimized to:
SELECT "X___SQL___"."a" AS "a"
FROM (
select a
from tb
group by a )
"X___SQL___"
GROUP BY "X___SQL___"."a"

This sql run successfully.

Donot know if the exiting products like mysql/oracle/hive/presto/spark... have 
this optimization or not.
Any comment is appreciated.




> subquery optimization:  remove unused subquery output columns
> -------------------------------------------------------------
>
>                 Key: KYLIN-1467
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1467
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>    Affects Versions: v1.2
>            Reporter: chelubai
>            Assignee: liyang
>            Priority: Minor
>
> Table 'tb' has more than 10,000,000 rows;
> {code:sql}
> -- sql-1:
> select a from tb group by a;
> {code}
> The sql-1 result has only 3 rows:
> v1
> v2
> v3
> When query kylin through BI tool tableau, the generated sql:
> -- sql-2:
> SELECT "X___SQL___"."a" AS "a"
> FROM (
> select a,b,c,d
> from tb
> group by a,b,c,d  )
> "X___SQL___"
> GROUP BY "X___SQL___"."a"
> the sql-1 and sql-2 is equivalent, but sql-2 fails with message:
>  Scan row count exceeded threshold: 1000000, please add filter condition to 
> narrow down backend scan range, like where clause.
> one solution:
> Add a calcite planner optimization rule to kylin-query, to  remove the 
> columns of the subquery that is not used outside, both from select, group by 
> and order by.
> so the sql-2 may be optimized to:
> SELECT "X___SQL___"."a" AS "a"
> FROM (
> select a
> from tb
> group by a )
> "X___SQL___"
> GROUP BY "X___SQL___"."a"
> This sql run successfully.
> Donot know if the exiting products like mysql/oracle/hive/presto/spark... 
> have this optimization or not.
> Any comment is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KYLIN-1467) subquery optimization: remove unused subquery output columns

Reply via email to