[jira] [Commented] (HIVE-9041) Generate better plan for queries containing both union and multi-insert [Spark Branch]

Chao (JIRA) Thu, 11 Dec 2014 17:32:10 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243556#comment-14243556
 ]


Chao commented on HIVE-9041:
----------------------------

I just found another bug regarding IOContext, when caching is turned on.
Taking the sample query above as example, right now I have this result plan:

{noformat}
   MW 1 (table0)   MW 2 (table1)   MW 3 (table0)   MW 4 (table1)
      \            /                 \             /
       \          /                   \           /
        \        /                     \         /
         \      /                       \       /
           RW 1                           RW 2
{noformat}

Suppose MapWorks are executed from left to right, also suppose we are just 
running with a single thread.
Then, the following will happen:
1. executing MW 1: since this is the first time we access table0, initialize 
IOContext and make input path point to table0;
2. executing MW 2: since this is the first time we access table1, initialize 
IOContext and make input path point to table1;
3. executing MW 3: since this is the second time access table0, *do not* 
initialize IOContext, and use the copy saved in step 2), *which is table1*.

Step 3 will then fail.

> Generate better plan for queries containing both union and multi-insert 
> [Spark Branch]
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-9041
>                 URL: https://issues.apache.org/jira/browse/HIVE-9041
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Chao
>            Assignee: Chao
>
> This is a follow-up for HIVE-8920. For queries like:
> {code}
> from (select * from table0 union all select * from table1) s
> insert overwrite table table3 select s.x, count(1) group by s.x
> insert overwrite table table4 select s.y, count(1) group by s.y;
> {code}
> Currently we generate the following plan:
> {noformat}
>     M1    M2
>       \  / \
>        U3   R5
>        |
>        R4
> {noformat}
> It's better, however, to have the following plan:
> {noformat}
>    M1  M2
>    |\  /|
>    | \/ |
>    | /\ |
>    R4  R5
> {noformat}
> Also, we can do some reseach in this JIRA to see if it's possible
> to remove UnionWork once and for all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9041) Generate better plan for queries containing both union and multi-insert [Spark Branch]

Reply via email to