[
https://issues.apache.org/jira/browse/IMPALA-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-9853:
----------------------------------
Labels: performance tpcds (was: )
> Push rank() predicates into sort
> --------------------------------
>
> Key: IMPALA-9853
> URL: https://issues.apache.org/jira/browse/IMPALA-9853
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Major
> Labels: performance, tpcds
>
> TPC-DS Q67 would benefit significantly if we could push the rank() predicate
> into the sort to do some reduction of unneeded data. The sorter could
> evaluate this predicate if it had the partition expressions available - as a
> post-processing step to the in-memory sort for the analytic sort group, it
> could do a pass over the sorted run, resetting a counter at the start of each
> partition boundary.
> It might be best to start with tackling IMPALA-3471 by applying the limit
> within sorted runs, since that doesn't require any planner work.
> {noformat}
> with results as
> ( select i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy
> ,d_moy ,s_store_id
> ,sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
> from store_sales ,date_dim ,store ,item
> where ss_sold_date_sk=d_date_sk
> and ss_item_sk=i_item_sk
> and ss_store_sk = s_store_sk
> and d_month_seq between 1212 and 1212 + 11
> group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy,
> d_moy,s_store_id)
> ,
> results_rollup as
> (select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy,
> s_store_id, sumsales
> from results
> union all
> select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy,
> null s_store_id, sum(sumsales) sumsales
> from results
> group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy
> union all
> select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, null
> d_moy, null s_store_id, sum(sumsales) sumsales
> from results
> group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy
> union all
> select i_category, i_class, i_brand, i_product_name, d_year, null d_qoy,
> null d_moy, null s_store_id, sum(sumsales) sumsales
> from results
> group by i_category, i_class, i_brand, i_product_name, d_year
> union all
> select i_category, i_class, i_brand, i_product_name, null d_year, null
> d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
> from results
> group by i_category, i_class, i_brand, i_product_name
> union all
> select i_category, i_class, i_brand, null i_product_name, null d_year, null
> d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
> from results
> group by i_category, i_class, i_brand
> union all
> select i_category, i_class, null i_brand, null i_product_name, null d_year,
> null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
> from results
> group by i_category, i_class
> union all
> select i_category, null i_class, null i_brand, null i_product_name, null
> d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
> from results
> group by i_category
> union all
> select null i_category, null i_class, null i_brand, null i_product_name,
> null d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
> from results)
> select *
> from (select i_category
> ,i_class
> ,i_brand
> ,i_product_name
> ,d_year
> ,d_qoy
> ,d_moy
> ,s_store_id
> ,sumsales
> ,rank() over (partition by i_category order by sumsales desc) rk
> from results_rollup) dw2
> where rk <= 100
> order by i_category
> ,i_class
> ,i_brand
> ,i_product_name
> ,d_year
> ,d_qoy
> ,d_moy
> ,s_store_id
> ,sumsales
> ,rk
> limit 100
> {noformat}
> Assigning to myself to fill in more details.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]