BELUGA BEHR created HIVE-20317:
----------------------------------

             Summary: Spark Dynamic Partition Pruning - Use Stats to Determine 
Partition Count
                 Key: HIVE-20317
                 URL: https://issues.apache.org/jira/browse/HIVE-20317
             Project: Hive
          Issue Type: Improvement
          Components: Spark
    Affects Versions: 3.1.0, 4.0.0
            Reporter: BELUGA BEHR


{code:xml|hive-site.xml}
<!-- HMS -->
<property>
<name>hive.metastore.limit.partition.request</name>
<value>2</value>
</property>
{code}

{code:sql}
CREATE TABLE partitioned_user(
        firstname VARCHAR(64),
        lastname  VARCHAR(64)
) PARTITIONED BY (country VARCHAR(64))
STORED AS PARQUET;

CREATE TABLE country(
    name VARCHAR(64)
) STORED AS PARQUET;

insert into partitioned_user partition (country='USA') values ("John", "Doe");
insert into partitioned_user partition (country='UK') values ("Sir", "Arthur");
insert into partitioned_user partition (country='FR') values ("Jacque", 
"Martin");

insert into country values ('USA');

set hive.execution.engine=spark;
set hive.spark.dynamic.partition.pruning=true;
explain select * from partitioned_user u where u.country in (select c.name from 
country c);
-- Error while compiling statement: FAILED: SemanticException 
MetaException(message:Number of partitions scanned (=3) on table 
'partitioned_user' exceeds limit (=2). This is controlled on the metastore 
server by hive.metastore.limit.partition.request.)
{code}

The EXPLAIN plan generation fails because there are three partitions involved 
in this query.  However, since Spark DPP is enabled, Hive should be able to use 
table stats to know that the {{country}} table only has one record and 
therefore there will only need to be one partitioned scanned and allow this 
query to execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to