[ 
https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104671#comment-14104671
 ] 

Chao commented on HIVE-7810:
----------------------------

(Sorry I pressed enter accidentally)

I ran explain on the two queries and noticed something:
1. for the first query, it generates FileSinks instead of ReduceSinks, which is 
strange.
2. for the first query, one MapWork is not in the dependency graph, maybe it's 
get ignored for the UnionWork.


> Insert overwrite table query has strange behavior when set 
> hive.optimize.union.remove=true [Spark Branch]
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7810
>                 URL: https://issues.apache.org/jira/browse/HIVE-7810
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Na Yang
>            Assignee: Chao
>
> Insert overwrite table query has strange behavior when 
> set hive.optimize.union.remove=true
> set hive.mapred.supports.subdirectories=true;
> We expect the following two sets of queries return the same set of data 
> result, but they do not. 
> 1)
> {noformat}
> insert overwrite table outputTbl1
> SELECT * FROM
> (
> select key, 1 as values from inputTbl1
> union all
> select * FROM (
>   SELECT key, count(1) as values from inputTbl1 group by key
>   UNION ALL
>   SELECT key, 2 as values from inputTbl1
> ) a
> )b;
> select * from outputTbl1 order by key, values;
> {noformat}
> Below is the query result:
> {noformat}
> 1     1
> 1     2
> 2     1
> 2     2
> 3     1
> 3     2
> 7     1
> 7     2
> 8     2
> 8     2
> 8     2
> {noformat}
> 2) 
> {noformat}
> SELECT * FROM
> (
> select key, 1 as values from inputTbl1
> union all
> select * FROM (
>   SELECT key, count(1) as values from inputTbl1 group by key
>   UNION ALL
>   SELECT key, 2 as values from inputTbl1
> ) a
> )b order by key, values;
> {noformat}
> Below is the query result:
> {noformat}
> 1     1
> 1     1
> 1     2
> 2     1
> 2     1
> 2     2
> 3     1
> 3     1
> 3     2
> 7     1
> 7     1
> 7     2
> 8     1
> 8     1
> 8     2
> 8     2
> 8     2
> {noformat}
> Some data is missing in the first set of query result. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to