Tamas Mate created IMPALA-12537:
-----------------------------------
Summary: Iceberg returns a deleted file's name when
INPUT__FILE__NAME is in the select list
Key: IMPALA-12537
URL: https://issues.apache.org/jira/browse/IMPALA-12537
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 4.3.0
Reporter: Tamas Mate
On S3, Impala returns 3 rows when {{INPUT__FILE__NAME}} is specified, but it
only returns the 2 data file names on HDFS.
iceberg_query_metadata table had 3 records originally (i=1, i=2, i=3) and the
second one i=2 was deleted, I observed the following test failure:
{code:none}
20:28:08 SELECT i, INPUT__FILE__NAME from
functional_parquet.iceberg_query_metadata tbl;
20:28:08
20:28:08 -- 2023-11-02 12:14:08,868 INFO MainThread: Started query
4448ab74f3a639a5:2b8cf6ea00000000
20:28:08 -- 2023-11-02 12:14:08,997 ERROR MainThread: Comparing
QueryTestResults (expected vs actual):
20:28:08
row_regex:[1-9]\d*|0,'.*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq'
==
1,'s3a://impala-test-uswest2-3/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/ed4288065b402e80-c705c47400000000_264336845_data.0.parq'
20:28:08
row_regex:[1-9]\d*|0,'.*/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/.*.parq'
==
2,'s3a://impala-test-uswest2-3/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/data/0a4510a5e8578659-1607b64000000000_435794406_data.0.parq'
20:28:08 Number of rows returned (expected vs actual): 2 != 3
{code}
cc.:[~boroknagyz], [~gaborkaszab]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)