[ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557115#comment-13557115
 ] 

Namit Jain commented on HIVE-3699:
----------------------------------

+1

We had so many checked in tests with wrong results.
                
> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-3699
>                 URL: https://issues.apache.org/jira/browse/HIVE-3699
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>            Reporter: Alexandre Fouché
>            Assignee: Navis
>         Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
>     FROM nikon
>     INSERT OVERWRITE TABLE e1
>     SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
>     WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
>     INSERT OVERWRITE TABLE e2
>     SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
>     WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
>     ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
>     hive> SELECT * from e1;
>     OK
>     NULL    2
>     1627575 25
>     1627576 70
>     1690950 22
>     1690952 42
>     1696705 199
>     1696706 66
>     1696730 229
>     1696759 85
>     1696893 218
>     Time taken: 0.229 seconds
>     hive> SELECT * from e2;
>     OK
>     NULL    2
>     1627575 25
>     1627576 70
>     1690950 22
>     1690952 42
>     1696705 199
>     1696706 66
>     1696730 229
>     1696759 85
>     1696893 218
>     Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
>     hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
>     WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
>     (...)
>     OK
>           <- There are no results, this is normal
>     Time taken: 41.471 seconds
>     hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
>     WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
>     (...)
>     OK
>     NULL  2
>     1627575 25
>     1627576 70
>     1690950 22
>     1690952 42
>     1696705 199
>     1696706 66
>     1696730 229
>     1696759 85
>     1696893 218
>     Time taken: 39.607 seconds
>     

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to