zabetak opened a new pull request, #5249:
URL: https://github.com/apache/hive/pull/5249

   ### What changes were proposed in this pull request?
   
   1. Add `applyCteRewriting` phase in `CalcitePlanner` for detecting and using 
CTEs; ensure rewrite logic is consistent with existing 
`hive.optimize.cte.materialize.*` properties.
   2. Model CTEs as materialized views (MVs) and add utility method in 
`HiveMaterializedViewUtils` for mapping a CTE to a `RelOptMaterialization`.
   3. Refactor core MV rewrite logic in `CalcitePlanner` to use during CTE 
rewrite and exploit CTEs in a cost-based manner.
   4. Add `HiveTableSpool` operator to represent CTEs and handle them in the 
plan using new rules: `TableScanToSpoolRule` and `RemoveUnusedCteRule`.
   5. Add `TableScanToSpoolRule`, and `RemoveUnusedCteRule` to add/remove 
spools from the plan.
   6. Enhance/Enrich metadata handlers for handling the Spool operator.
   7. Add `AggregatedColumns` metadata (and respective handler and metadata 
query), for controlling if a CTE is a "full aggregate" at the CBO (RelNode) 
level to ensure consistent behavior with 
`hive.optimize.cte.materialize.full.aggregate.only` property.
   8. Add `HiveSqlTypeUtil.containsSqlType` for detecting and skipping the 
creation of CTEs with untyped nulls since they are not supported (HIVE-11217).
   9. Add `hive.optimize.cte.suggester.class` and 
CommonTableExpressionSuggester interface to provide pluggable CTE detection 
logic. Given that CTE detection logic can range from basic tree traversal 
algorithms to complex workload analysis frameworks this part needs to be 
configurable since there is no one-size-fits-all implementation. The 
configuration property also allows proprietary algorithms to be integrated in 
HiveServer2 by implementing the necessary APIs and adding the jars in the 
classpath.
   10. Add prototype implementation for CTE detection logic in 
CommonTableExpressionIdentitySuggester using CommonRelSubExprRegisterRule and 
CommonTableExpressionRegistry. Although the implementation is rather simple it 
can discover various interesting CTEs as demonstrated by the tests and can be 
indeed useful in a prod environment.
   11. Map spool(s) to `WITH` clauses during the RelNode to AST conversion 
(ASTConverter, ASTBuilder, PlanModifierForASTConv) to exploit existing CTE 
materialization feature (HIVE-11752).
   12. Modify (slighly) `SemanticAnalyzer`/`CalcitePlanner` to enable AST-based 
CTE materialization (getMetadata) post CBO run.
   
   ### Why are the changes needed?
   * Open the road for cost-based CTE optimizations in Hive
   * Pluggable/Extensible CTE detection logic
   
   ### Does this PR introduce _any_ user-facing change?
   By default no. After tuning the new/existing cte properties query plans may 
change affecting performance.
   
   ### Is the change a dependency upgrade?
   No
   
   ### How was this patch tested?
   Tests for:
   * CTE detection logic using the `CommonTableExpressionPrintSuggester` 
(`TestTezTPCDS30TBPerfCliDriver`)
   * demonstrating (end-to-end) the CTE feature (cte_cbo_rewrite_0.q)
   * verify coherence of CTE rewrite with 
`hive.optimize.cte.materialize.full.aggregate.only` (cte_mat_12.q)
   * spool JSON serialization (cte_cbo_plan_json.q)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to