[ 
https://issues.apache.org/jira/browse/HIVE-15870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891397#comment-15891397
 ] 

Sergey Shelukhin commented on HIVE-15870:
-----------------------------------------

Related to predicate pushdown. The logs for mm and non-mm are pretty much the 
same if merging map job files is disabled and IF is set to HiveInputFormat.
There is only one scan of tbl2, and both times Hive reports reading 2 files of 
the same sizes pairwise, except that in mm case they are in separate 
directories (corresponding to separate inserts).
However, in mm case, one of the files is entirely filtered out. Since the joins 
rely on both rows, that means no result for mm query.

> MM tables - parquet_join test fails
> -----------------------------------
>
>                 Key: HIVE-15870
>                 URL: https://issues.apache.org/jira/browse/HIVE-15870
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> All the selects produce results, except for the last query.
> Looking at MM logs, it looks like the inputs are read correctly. Must be 
> something parquet-specific w.r.t. multiple files in a table.
> {noformat}
> set hive.optimize.index.filter = true;
> set hive.auto.convert.join=false;
> CREATE TABLE tbl1(id INT) STORED AS PARQUET;
> INSERT INTO tbl1 VALUES(1), (2);
> CREATE TABLE tbl2(id INT, value STRING) STORED AS PARQUET;
> INSERT INTO tbl2 VALUES(1, 'value1');
> INSERT INTO tbl2 VALUES(1, 'value2');
> select tbl1.id, t1.value
> FROM tbl1
> JOIN (SELECT * FROM tbl2 WHERE value='value2') t1 ON tbl1.id=t1.id;
> select tbl1.id, t1.value
> FROM tbl1
> JOIN (SELECT * FROM tbl2 WHERE value='value1') t1 ON tbl1.id=t1.id;
> select tbl1.id, t1.value, t2.value
> FROM tbl1
> JOIN tbl2 t1 ON tbl1.id=t1.id
> JOIN tbl2 t2 ON tbl1.id=t2.id
> select tbl1.id, t1.value, t2.value
> FROM tbl1
> JOIN (SELECT * FROM tbl2 WHERE value='value1') t1 ON tbl1.id=t1.id
> JOIN (SELECT * FROM tbl2 WHERE value='value2') t2 ON tbl1.id=t2.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to