[ 
https://issues.apache.org/jira/browse/HIVE-24081?focusedWorklogId=475219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475219
 ]

ASF GitHub Bot logged work on HIVE-24081:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Aug/20 10:06
            Start Date: 27/Aug/20 10:06
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #1437:
URL: https://github.com/apache/hive/pull/1437


   ### What changes were proposed in this pull request?
   * Do phase 1 parsing of subquery expressions in order to count CTE 
references in those subqueries
   * Add a config to materialize CTEs with aggregate output only
   
   
   ### Why are the changes needed?
   Improve performance of complex queries referencing the same fully aggregate 
CTE more than one times.
   
   ### Does this PR introduce _any_ user-facing change?
   Adds a new config into HiveConf: 
`hive.optimize.cte.materialize.full.aggregate.only`.
   Prior this patch if `hive.optimize.cte.materialize.threshold` was higher 
than -1 all non-subquery CTEs were materialized if they were referenced more 
times than the threshold. This patch limits this to fully aggregate CTEs only 
by default. The original behavior can restored by setting 
`hive.optimize.cte.materialize.full.aggregate.only` to false.
   
   ### How was this patch tested?
   * New q tests were added.
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_mat_6.q -pl itests/qtest -Pitests
   ```
   * Run query14 with `set hive.optimize.cte.materialize.threshold=3;`
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestTezPerfCliDriver -Dqfile=query14.q -pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 475219)
    Remaining Estimate: 0h
            Time Spent: 10m

> Enable pre-materializing CTEs referenced in scalar subqueries
> -------------------------------------------------------------
>
>                 Key: HIVE-24081
>                 URL: https://issues.apache.org/jira/browse/HIVE-24081
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-11752 introduces materializing CTE based on config
> {code}
> hive.optimize.cte.materialize.threshold
> {code}
> Goal of this jira is
> * extending the implementation to support materializing CTE's referenced in 
> scalar subqueries
> * add a config to materialize CTEs with aggregate output only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to