[
https://issues.apache.org/jira/browse/TAJO-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485001#comment-14485001
]
Hyunsik Choi commented on TAJO-1406:
------------------------------------
Hi guys,
Two approaches seem to be suggested from you guys. Both approaches look good to
me. I have some comments.
For SQL rewrite approach
* SQL rewrite approach would be work for simple queries. But, this approach
will have lots of burden when it handles complicated SQL statements like union,
correlated subqueries, derived table and table alias. Also, I'm not sure that
this approach handles all WITH clause cases. Later, we may completely rewrite
all codes related to WITH clause when we face the limitation which cannot
handle some cases.
* SQL rewrite is error prone. It is hard to the way to validation.
* SQL rewrite approach requires another SQL handler and SQL statement builder.
But, we already have the logical planner and query rewrite engine. So, we don't
need to reinvent wheel.
For temporary table approach
* This approach is intuitive, and it works well for many WITH cases.
* Temporary table requires materialization. As a result, query optimizer may
miss some opportunities to push multiple SQL blocks into one execution block.
My suggestion is to combine both approaches as follows:
Firstly, it would be good if we start query rewriting for WITH caluse. However,
this query rewriting should be done in logical planner and query rewrite engine
instead of around SQLAnalyzer. We should use this approach as many as possible
because this approach gives more opportunities to push multiple SQL blocks into
one execution block, probably reducing disk materialization. For this approach,
we should implement algebra for WITH clause and LogicalNode. Then, we should
implement QueryRewriteEngine that will rewrite WITH clause into subqueries.
LogicalPlanner, optimizer, and rewriter is the right place to handle this
problem. When they face some limitation to handle some WITH clauses, we would
improve logical planner and rewriter.
Besides, the rewriting approach does not handle the case where the same WITH
list element is used in multiple query blocks. For this, we may need the
temporary table approach. We also improve Tajo distributed query engine to
fully support temporary tables. We need to improve DAG framework and shuffles
for it. I think that this approach is also required for scalar subquery. So, I
propose that we should firstly survey all requirements for both WITH clause and
scalar subquery. Then, with the requirements, we should firstly add the
temporary table support. If so, the following works would be easier.
How do you think about my suggestion. If you are agree with my suggestion, I'd
like to discuss details about them.
> Support WITH clause without RECURSIVE
> -------------------------------------
>
> Key: TAJO-1406
> URL: https://issues.apache.org/jira/browse/TAJO-1406
> Project: Tajo
> Issue Type: New Feature
> Components: parser, planner/optimizer
> Reporter: Dongjoon Hyun
> Assignee: Seungun Choe
>
> WITH clause is widely used expression in SQL language, for e.g., TPC-DS Q1,
> Q2, Q4, Q5, Q11, Q14, Q23, Q24, Q30, Q31, Q33, Q39, Q47, Q51, Q54, Q56, Q57,
> Q58, Q59, Q60, Q64, Q74, Q75, Q77, Q78, Q80, Q81, Q83, Q95, Q97.
> Refer the following queries in
> http://www.postgresql.org/docs/9.4/static/queries-with.html.
> {code:sql}
> WITH regional_sales AS (
> SELECT region, SUM(amount) AS total_sales
> FROM orders
> GROUP BY region
> ), top_regions AS (
> SELECT region
> FROM regional_sales
> WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
> )
> SELECT region,
> product,
> SUM(quantity) AS product_units,
> SUM(amount) AS product_sales
> FROM orders
> WHERE region IN (SELECT region FROM top_regions)
> GROUP BY region, product;
> {code}
> In this issue, WITH RECURSIVE does not support. It should be handled as a
> separate issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)