[ 
https://issues.apache.org/jira/browse/HIVE-24606?focusedWorklogId=536513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-536513
 ]

ASF GitHub Bot logged work on HIVE-24606:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Jan/21 14:23
            Start Date: 15/Jan/21 14:23
    Worklog Time Spent: 10m 
      Work Description: okumin commented on a change in pull request #1873:
URL: https://github.com/apache/hive/pull/1873#discussion_r558338464



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -1411,47 +1414,92 @@ public String toString() {
     }
   }
 
-  private List<Task<?>> toRealRootTasks(List<CTEClause> execution) {
-    List<Task<?>> cteRoots = new ArrayList<>();
-    List<Task<?>> cteLeafs = new ArrayList<>();
-    List<Task<?>> curTopRoots = null;
-    List<Task<?>> curBottomLeafs = null;
-    for (CTEClause current : execution) {
-      if (current.parents.isEmpty() && curTopRoots != null) {
-        cteRoots.addAll(curTopRoots);
-        cteLeafs.addAll(curBottomLeafs);
-        curTopRoots = curBottomLeafs = null;

Review comment:
       This looks like the root cause. In [the case I put in the 
ticket](https://issues.apache.org/jira/browse/HIVE-24606), there are the 
following dependencies.
   
   - `<root> -> a1`
   - `<root> -> x`
   - `<root> -> a2`
   - `a2 -> a1`
   
   But the old implementation tries to traverse CTEs in order of `a1` -> `x` -> 
`a2` -> `<root>`, the order which depends on AST.
   As a result, when it visits `a2`, the information of `a1` has gone and it 
fails to link `a2` with `a1`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 536513)
    Time Spent: 0.5h  (was: 20m)

> Multi-stage materialized CTEs can lose intermediate data
> --------------------------------------------------------
>
>                 Key: HIVE-24606
>                 URL: https://issues.apache.org/jira/browse/HIVE-24606
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 2.3.7, 3.1.2, 4.0.0
>            Reporter: okumin
>            Assignee: okumin
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With complex multi-stage CTEs, Hive can start a latter stage before its 
> previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve 
> dependency between multistage materialized CTEs when a non-materialized CTE 
> cuts in.
>  
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It 
> means the dependency between `a1` and `a2` will be ignored and `a2` can start 
> without waiting for `a1`. As a result, the above query returns the following 
> result.
> {code:java}
> +-----+
> | id  |
> +-----+
> | a1  |
> | x   |
> +-----+
> {code}
> For your information, I ran this test with revision = 
> 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to