Dmitriy Fingerman created HIVE-28266:
----------------------------------------
Summary: Iceberg: select count(*) from *.data_files metadata
tables gives wrong result
Key: HIVE-28266
URL: https://issues.apache.org/jira/browse/HIVE-28266
Project: Hive
Issue Type: Bug
Reporter: Dmitriy Fingerman
Assignee: Dmitriy Fingerman
In Hive Iceberg, every table has a corresponding metadata table "*.data_files"
that contains info about the files that contain table's data.
select count(*) from a data_file metadata table returns number of rows in the
data table instead of number of data files from the metadata table.
CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by
iceberg stored as orc TBLPROPERTIES
('external.table.purge'='true','format-version'='2');
insert into x values
('amy', 35, 123412344),
('adxfvy', 36, 123412534),
('amsdfyy', 37, 123417234),
('asafmy', 38, 123412534);
insert into x values
('amerqwy', 39, 123441234),
('amyxzcv', 40, 123341234),
('erweramy', 45, 122341234);
Select * from default.x.data_files;
-- Returns 2 records in the output
Select count(*) from default.x.data_files;
-- Returns 7 instead of 2
--
This message was sent by Atlassian Jira
(v8.20.10#820010)