kasakrisz opened a new pull request #1437:
URL: https://github.com/apache/hive/pull/1437


   ### What changes were proposed in this pull request?
   * Do phase 1 parsing of subquery expressions in order to count CTE 
references in those subqueries
   * Add a config to materialize CTEs with aggregate output only
   
   
   ### Why are the changes needed?
   Improve performance of complex queries referencing the same fully aggregate 
CTE more than one times.
   
   ### Does this PR introduce _any_ user-facing change?
   Adds a new config into HiveConf: 
`hive.optimize.cte.materialize.full.aggregate.only`.
   Prior this patch if `hive.optimize.cte.materialize.threshold` was higher 
than -1 all non-subquery CTEs were materialized if they were referenced more 
times than the threshold. This patch limits this to fully aggregate CTEs only 
by default. The original behavior can restored by setting 
`hive.optimize.cte.materialize.full.aggregate.only` to false.
   
   ### How was this patch tested?
   * New q tests were added.
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_mat_6.q -pl itests/qtest -Pitests
   ```
   * Run query14 with `set hive.optimize.cte.materialize.threshold=3;`
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestTezPerfCliDriver -Dqfile=query14.q -pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to