Joe McDonnell created IMPALA-13898:
--------------------------------------
Summary: Tuple cache produces incorrect result when querying
scale_db.num_partitions_1234_blocks_per_partition_1
Key: IMPALA-13898
URL: https://issues.apache.org/jira/browse/IMPALA-13898
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 5.0.0
Reporter: Joe McDonnell
Tuple caching generates the same key for these two queries:
{noformat}
select * from scale_db.num_partitions_1234_blocks_per_partition_1 where j=1
select * from scale_db.num_partitions_1234_blocks_per_partition_1 where j=1 or
j=2;{noformat}
This is a scenario from catalog_service/test_large_num_partitions.py. It is a
correctness issue.
scale_db.num_partitions_1234_blocks_per_partition_1 is an exotic table where
all of the partitions point to the same location / file. It also only has
partition columns, so the contents of the file don't matter. This means that
j=1 and j=2 both point to the same file. The partition information is not
included in the key, so the two are indistinguishable. We'll need to expand
what we put in the cache key to handle this scenario.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)