Dmitriy Fingerman created HIVE-28266:
----------------------------------------

             Summary: Iceberg: select count(*) from *.data_files metadata 
tables gives wrong result
                 Key: HIVE-28266
                 URL: https://issues.apache.org/jira/browse/HIVE-28266
             Project: Hive
          Issue Type: Bug
            Reporter: Dmitriy Fingerman
            Assignee: Dmitriy Fingerman


In Hive Iceberg, every table has a corresponding metadata table "*.data_files" 
that contains info about the files that contain table's data.

select count(*) from a data_file metadata table returns number of rows in the 
data table instead of number of data files from the metadata table.


CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by 
iceberg stored as orc TBLPROPERTIES 
('external.table.purge'='true','format-version'='2');

insert into x values 
('amy', 35, 123412344),
('adxfvy', 36, 123412534),
('amsdfyy', 37, 123417234),
('asafmy', 38, 123412534);

insert into x values 
('amerqwy', 39, 123441234),
('amyxzcv', 40, 123341234),
('erweramy', 45, 122341234);

Select * from default.x.data_files;
-- Returns 2 records in the output

Select count(*) from default.x.data_files;
-- Returns 7 instead of 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to