[jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query

Navis (JIRA) Thu, 23 Aug 2012 18:46:46 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Navis updated HIVE-1173:
------------------------

    Status: Patch Available  (was: Open)

Passed all tests
                
> Partition pruner cancels pruning if non-deterministic function present in 
> filtering expression only in joins is present in query
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1173
>                 URL: https://issues.apache.org/jira/browse/HIVE-1173
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.4.1, 0.4.0, 0.10.0
>            Reporter: Vladimir Klimontovich
>            Assignee: Navis
>
> Brief description:
> case 1) non-deterministic present in partition condition, joins are present 
> in query => partition pruner doesn't do filtering of partitions based on 
> condition
> case 2) non-deterministic present in partition condition, joins aren't 
> present in query => partition pruner do filtering of partitions based on 
> condition
> It's quite illogical when pruning depends on presence of joins in query.
> Example:
> Let's consider following sequence of hive queries:
> 1) Create non-deterministic function:
> create temporary function UDF2 as 'UDF2';
> {{
> import org.apache.hadoop.hive.ql.exec.UDF;
> import org.apache.hadoop.hive.ql.udf.UDFType;
> @UDFType(deterministic=false)
>       public class UDF2 extends UDF {
>               public String evaluate(String val) {
>                       return val;
>               }
>       }
> }}
> 2) Create tables
> CREATE TABLE Main (
>       a STRING,
>       b INT
> )
> PARTITIONED BY(part STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '10'
> STORED AS TEXTFILE;
> ALTER TABLE Main ADD PARTITION (part="part1") LOCATION 
> "/hive-join-test/part1/";
> ALTER TABLE Main ADD PARTITION (part="part2") LOCATION 
> "/hive-join-test/part2/";
> CREATE TABLE Joined (
>       a STRING,
>       f STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '10'
> STORED AS TEXTFILE
> LOCATION '/hive-join-test/join/';
> 3) Run first query:
> select 
>       m.a,
>       m.b
> from Main m
> where
>       part > UDF2('part0') AND part = 'part1';
> The pruner will work for this query: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1
> 4) Run second query (with join):
> select 
>       m.a,
>       j.a,
>       m.b
> from Main m
> join Joined j on
>       j.a=m.a
> where
>       part > UDF2('part0') AND part = 'part1';
> Pruner doesn't work: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join
> 5) Also lets try to run query with MAPJOIN hint
> select /*+MAPJOIN(j)*/ 
>       m.a,
>       j.a,
>       m.b
> from Main m
> join Joined j on
>       j.a=m.a
> where
>       part > UDF2('part0') AND part = 'part1';
> The result is the same, pruner doesn't work: 
> mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query

Reply via email to