[ 
https://issues.apache.org/jira/browse/IMPALA-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159590#comment-17159590
 ] 

Aman Sinha edited comment on IMPALA-9853 at 7/17/20, 3:31 AM:
--------------------------------------------------------------

For a restricted use case such as q67, we could do something along the 
following lines in the planner:  (companion changes will also be needed in the 
backend sorter):
 - In the AnalyticPlanner, check that it is evaluating a single function, 
namely rank() (or similar such as rownum())
 - Check the analyzer's globalstate for the list of conjuncts and identify 
conjuncts with range/equality condition on rank()
 - Drill down into the SlotRef from this conjunct to see it references the 
analytic function SlotRef that is being materialized.  Get the RHS value of the 
conjunct and check if it is NumericLiteral. Let this value be N.
 - If these are satisfied, check the partition-by expr.  Within each partition, 
we want to get the top N rows only based on the order-by expr.
- This means we need to hint the backend to do a special type of sort operation 
(not yet supported in the backend) that only keeps top N rows (sorted on the 
order-by expr) within each of bucket of rows based on the partition-by expr.
For certain situations where NDV of the partition-by expr is very low, this 
degenerates into just doing a Top-N on the {partition-by, order-by} pair since 
each thread of the sort may be only getting a single value of the partition-by 
column. This would simplify the implementation but the NDV of the partition-by 
expr may not be known.



was (Author: amansinha):
For a restricted use case such as q67, we can do the following in the planner:
 - In the AnalyticPlanner, check that it is evaluating a single function, 
namely rank()
 - Check the analyzer's globalstate for the list of conjuncts and identify 
conjuncts with range/equality condition on rank()
 - Drill down into the SlotRef from this conjunct to see it references the 
analytic function SlotRef that is being
       materialized.  Get the RHS value of the conjunct and check if it is 
NumericLiteral
 - If these are satisfied, instead of generating a TotalSort, generate a TopN 
sort using the numeric value above.
This is the general idea based on a short amount of analysis.  I will do 
further investigation .. but mainly thinking of a narrow use case. 


> Push rank() predicates into sort
> --------------------------------
>
>                 Key: IMPALA-9853
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9853
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Tim Armstrong
>            Assignee: Aman Sinha
>            Priority: Major
>              Labels: performance, tpcds
>
> TPC-DS Q67 would benefit significantly if we could push the rank() predicate 
> into the sort to do some reduction of unneeded data. The sorter could 
> evaluate this predicate if it had the partition expressions available - as a 
> post-processing step to the in-memory sort for the analytic sort group, it 
> could do a pass over the sorted run, resetting a counter at the start of each 
> partition boundary.
> It might be best to start with tackling IMPALA-3471 by applying the limit 
> within sorted runs, since that doesn't require any planner work.
> {noformat}
> with results as
> (     select i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy 
> ,d_moy ,s_store_id
>                   ,sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
>             from store_sales ,date_dim ,store ,item
>        where  ss_sold_date_sk=d_date_sk
>           and ss_item_sk=i_item_sk
>           and ss_store_sk = s_store_sk
>           and d_month_seq between 1212 and 1212 + 11
>        group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, 
> d_moy,s_store_id)
>  ,
>  results_rollup as
>  (select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy, 
> s_store_id, sumsales
>   from results
>   union all
>   select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy, 
> null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy
>   union all
>   select i_category, i_class, i_brand, i_product_name, d_year, d_qoy, null 
> d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand, i_product_name, d_year, d_qoy
>   union all
>   select i_category, i_class, i_brand, i_product_name, d_year, null d_qoy, 
> null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand, i_product_name, d_year
>   union all
>   select i_category, i_class, i_brand, i_product_name, null d_year, null 
> d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand, i_product_name
>   union all
>   select i_category, i_class, i_brand, null i_product_name, null d_year, null 
> d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class, i_brand
>   union all
>   select i_category, i_class, null i_brand, null i_product_name, null d_year, 
> null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category, i_class
>   union all
>   select i_category, null i_class, null i_brand, null i_product_name, null 
> d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results
>   group by i_category
>   union all
>   select null i_category, null i_class, null i_brand, null i_product_name, 
> null d_year, null d_qoy, null d_moy, null s_store_id, sum(sumsales) sumsales
>   from results)
>  select  *
> from (select i_category
>             ,i_class
>             ,i_brand
>             ,i_product_name
>             ,d_year
>             ,d_qoy
>             ,d_moy
>             ,s_store_id
>             ,sumsales
>             ,rank() over (partition by i_category order by sumsales desc) rk
>       from results_rollup) dw2
> where rk <= 100
> order by i_category
>         ,i_class
>         ,i_brand
>         ,i_product_name
>         ,d_year
>         ,d_qoy
>         ,d_moy
>         ,s_store_id
>         ,sumsales
>         ,rk
> limit 100
> {noformat}
> Assigning to myself to fill in more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to