[
https://issues.apache.org/jira/browse/HIVE-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy Fingerman updated HIVE-28266:
-------------------------------------
Description:
In Hive Iceberg, every table has a corresponding metadata table "*.data_files"
that contains info about the files that contain table's data.
select count(*) from a data_file metadata table returns number of rows in the
data table instead of number of data files from the metadata table.
{code:java}
CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by
iceberg stored as orc TBLPROPERTIES
('external.table.purge'='true','format-version'='2');
insert into x values
('amy', 35, 123412344),
('adxfvy', 36, 123412534),
('amsdfyy', 37, 123417234),
('asafmy', 38, 123412534);
insert into x values
('amerqwy', 39, 123441234),
('amyxzcv', 40, 123341234),
('erweramy', 45, 122341234);
Select * from default.x.data_files;
– Returns 2 records in the output
Select count from default.x.data_files;
– Returns 7 instead of 2
{code}
was:
In Hive Iceberg, every table has a corresponding metadata table "*.data_files"
that contains info about the files that contain table's data.
select count(*) from a data_file metadata table returns number of rows in the
data table instead of number of data files from the metadata table.
CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by
iceberg stored as orc TBLPROPERTIES
('external.table.purge'='true','format-version'='2');
insert into x values
('amy', 35, 123412344),
('adxfvy', 36, 123412534),
('amsdfyy', 37, 123417234),
('asafmy', 38, 123412534);
insert into x values
('amerqwy', 39, 123441234),
('amyxzcv', 40, 123341234),
('erweramy', 45, 122341234);
Select * from default.x.data_files;
-- Returns 2 records in the output
Select count(*) from default.x.data_files;
-- Returns 7 instead of 2
> Iceberg: select count(*) from *.data_files metadata tables gives wrong result
> -----------------------------------------------------------------------------
>
> Key: HIVE-28266
> URL: https://issues.apache.org/jira/browse/HIVE-28266
> Project: Hive
> Issue Type: Bug
> Reporter: Dmitriy Fingerman
> Assignee: Dmitriy Fingerman
> Priority: Major
>
> In Hive Iceberg, every table has a corresponding metadata table
> "*.data_files" that contains info about the files that contain table's data.
> select count(*) from a data_file metadata table returns number of rows in the
> data table instead of number of data files from the metadata table.
>
> {code:java}
> CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by
> iceberg stored as orc TBLPROPERTIES
> ('external.table.purge'='true','format-version'='2');
> insert into x values
> ('amy', 35, 123412344),
> ('adxfvy', 36, 123412534),
> ('amsdfyy', 37, 123417234),
> ('asafmy', 38, 123412534);
> insert into x values
> ('amerqwy', 39, 123441234),
> ('amyxzcv', 40, 123341234),
> ('erweramy', 45, 122341234);
> Select * from default.x.data_files;
> – Returns 2 records in the output
> Select count from default.x.data_files;
> – Returns 7 instead of 2
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)