[ 
https://issues.apache.org/jira/browse/IMPALA-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9853:
----------------------------------
    Labels: performance tpcds  (was: )

> Push rank() predicates into sort
> --------------------------------
>
>                 Key: IMPALA-9853
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9853
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: performance, tpcds
>
> TPC-DS Q67 would benefit significantly if we could push the rank() predicate 
> into the sort to do some reduction of unneeded data. The sorter could 
> evaluate this predicate if it had the partition expressions available - as a 
> post-processing step to the in-memory sort for the analytic sort group, it 
> could do a pass over the sorted run, resetting a counter at the start of each 
> partition boundary.
> It might be best to start with tackling IMPALA-3471 by applying the limit 
> within sorted runs, since that doesn't require any planner work.
> {noformat}
> with results as
> (     select i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy 
> ,d_moy ,s_store_id
>                   ,sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
>             from store_sales ,date_dim ,store ,item
>        where  ss_sold_date_sk=d_date_sk
>           and ss_item_sk=i_item_sk
>           and ss_store_sk = s_store_sk
>           and d_month_seq between 1212 and 1212 + 11
>        group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, 
> d_moy,s_store_id)
>  ,
>  results_rollup as
>  (select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy, 
> s_store_id, sumsales
>   from results
>   union all
>   select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy, 
> null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy
>   union all
>   select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, null 
> d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy
>   union all
>   select i_category, i_class, i_brand, i_product_name, d_year, null d_qoy, 
> null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand, i_product_name, d_year
>   union all
>   select i_category, i_class, i_brand, i_product_name, null d_year, null 
> d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand, i_product_name
>   union all
>   select i_category, i_class, i_brand, null i_product_name, null d_year, null 
> d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand
>   union all
>   select i_category, i_class, null i_brand, null i_product_name, null d_year, 
> null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class
>   union all
>   select i_category, null i_class, null i_brand, null i_product_name, null 
> d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category
>   union all
>   select null i_category, null i_class, null i_brand, null i_product_name, 
> null d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results)
>  select  *
> from (select i_category
>             ,i_class
>             ,i_brand
>             ,i_product_name
>             ,d_year
>             ,d_qoy
>             ,d_moy
>             ,s_store_id
>             ,sumsales
>             ,rank() over (partition by i_category order by sumsales desc) rk
>       from results_rollup) dw2
> where rk <= 100
> order by i_category
>         ,i_class
>         ,i_brand
>         ,i_product_name
>         ,d_year
>         ,d_qoy
>         ,d_moy
>         ,s_store_id
>         ,sumsales
>         ,rk
> limit 100
> {noformat}
> Assigning to myself to fill in more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to