[jira] [Updated] (HIVE-28581) Support Partition Prunning stats optimization for Iceberg tables

Denys Kuzmenko (Jira) Fri, 18 Oct 2024 03:33:53 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-28581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Denys Kuzmenko updated HIVE-28581:
----------------------------------
    Description: 
Add support for Iceberg partition prune stats optimization

{code}
create external table ice01 (`i` int, `t` timestamp) 
    partitioned by (year int, month int, day int) 
stored by iceberg tblproperties ('format-version'='2', 
'write.summary.partition-limit'='10');

insert into ice01 (i, year, month, day) values
(1, 2023, 10, 3),
(2, 2023, 10, 3),
(2, 2023, 10, 3),
(3, 2023, 10, 4),
(4, 2023, 10, 4);
{code}
explain
select i from ice01 where year=2023 and month = 10 and day = 3;
{code}
POSTHOOK: type: QUERY
POSTHOOK: Input: default@ice01
POSTHOOK: Input: default@ice01@year=2023/month=10/day=3
POSTHOOK: Output: hdfs://### HDFS PATH ###
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: ice01
                  filterExpr: ((year = 2023) and (month = 10) and (day = 3)) 
(type: boolean)
                  Statistics: Num rows: 3 Data size: 48 Basic stats: COMPLETE 
Column stats: NONE
                  Filter Operator
                    predicate: ((year = 2023) and (month = 10) and (day = 3)) 
(type: boolean)
                    Statistics: Num rows: 3 Data size: 48 Basic stats: COMPLETE 
Column stats: NONE
                    Select Operator
                      expressions: i (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 3 Data size: 48 Basic stats: 
COMPLETE Column stats: NONE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 3 Data size: 48 Basic stats: 
COMPLETE Column stats: NONE
                        table:
                            input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                            output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                            serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

> Support Partition Prunning stats optimization for Iceberg tables
> ----------------------------------------------------------------
>
>                 Key: HIVE-28581
>                 URL: https://issues.apache.org/jira/browse/HIVE-28581
>             Project: Hive
>          Issue Type: Improvement
>      Security Level: Public(Viewable by anyone) 
>            Reporter: Denys Kuzmenko
>            Priority: Major
>
> Add support for Iceberg partition prune stats optimization
> {code}
> create external table ice01 (`i` int, `t` timestamp) 
>     partitioned by (year int, month int, day int) 
> stored by iceberg tblproperties ('format-version'='2', 
> 'write.summary.partition-limit'='10');
> insert into ice01 (i, year, month, day) values
> (1, 2023, 10, 3),
> (2, 2023, 10, 3),
> (2, 2023, 10, 3),
> (3, 2023, 10, 4),
> (4, 2023, 10, 4);
> {code}
> explain
> select i from ice01 where year=2023 and month = 10 and day = 3;
> {code}
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@ice01
> POSTHOOK: Input: default@ice01@year=2023/month=10/day=3
> POSTHOOK: Output: hdfs://### HDFS PATH ###
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> #### A masked pattern was here ####
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: ice01
>                   filterExpr: ((year = 2023) and (month = 10) and (day = 3)) 
> (type: boolean)
>                   Statistics: Num rows: 3 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>                   Filter Operator
>                     predicate: ((year = 2023) and (month = 10) and (day = 3)) 
> (type: boolean)
>                     Statistics: Num rows: 3 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: i (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 3 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
>                       File Output Operator
>                         compressed: false
>                         Statistics: Num rows: 3 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
>                         table:
>                             input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                             output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                             serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28581) Support Partition Prunning stats optimization for Iceberg tables

Reply via email to