[
https://issues.apache.org/jira/browse/HIVE-28581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denys Kuzmenko updated HIVE-28581:
----------------------------------
Description:
Add support for Iceberg partition prune stats optimization
{code}
create external table ice01 (`i` int, `t` timestamp)
partitioned by (year int, month int, day int)
stored by iceberg tblproperties ('format-version'='2',
'write.summary.partition-limit'='10');
insert into ice01 (i, year, month, day) values
(1, 2023, 10, 3),
(2, 2023, 10, 3),
(2, 2023, 10, 3),
(3, 2023, 10, 4),
(4, 2023, 10, 4);
{code}
explain
select i from ice01 where year=2023 and month = 10 and day = 3;
{code}
POSTHOOK: type: QUERY
POSTHOOK: Input: default@ice01
POSTHOOK: Input: default@ice01@year=2023/month=10/day=3
POSTHOOK: Output: hdfs://### HDFS PATH ###
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: ice01
filterExpr: ((year = 2023) and (month = 10) and (day = 3))
(type: boolean)
Statistics: Num rows: 3 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: ((year = 2023) and (month = 10) and (day = 3))
(type: boolean)
Statistics: Num rows: 3 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: i (type: int)
outputColumnNames: _col0
Statistics: Num rows: 3 Data size: 48 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 3 Data size: 48 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
> Support Partition Prunning stats optimization for Iceberg tables
> ----------------------------------------------------------------
>
> Key: HIVE-28581
> URL: https://issues.apache.org/jira/browse/HIVE-28581
> Project: Hive
> Issue Type: Improvement
> Security Level: Public(Viewable by anyone)
> Reporter: Denys Kuzmenko
> Priority: Major
>
> Add support for Iceberg partition prune stats optimization
> {code}
> create external table ice01 (`i` int, `t` timestamp)
> partitioned by (year int, month int, day int)
> stored by iceberg tblproperties ('format-version'='2',
> 'write.summary.partition-limit'='10');
> insert into ice01 (i, year, month, day) values
> (1, 2023, 10, 3),
> (2, 2023, 10, 3),
> (2, 2023, 10, 3),
> (3, 2023, 10, 4),
> (4, 2023, 10, 4);
> {code}
> explain
> select i from ice01 where year=2023 and month = 10 and day = 3;
> {code}
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@ice01
> POSTHOOK: Input: default@ice01@year=2023/month=10/day=3
> POSTHOOK: Output: hdfs://### HDFS PATH ###
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> #### A masked pattern was here ####
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: ice01
> filterExpr: ((year = 2023) and (month = 10) and (day = 3))
> (type: boolean)
> Statistics: Num rows: 3 Data size: 48 Basic stats: COMPLETE
> Column stats: NONE
> Filter Operator
> predicate: ((year = 2023) and (month = 10) and (day = 3))
> (type: boolean)
> Statistics: Num rows: 3 Data size: 48 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: i (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 3 Data size: 48 Basic stats:
> COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 3 Data size: 48 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)