[
https://issues.apache.org/jira/browse/IMPALA-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944439#comment-17944439
]
Yida Wu edited comment on IMPALA-13964 at 4/14/25 6:46 PM:
-----------------------------------------------------------
It seems that I can repro the issue with the latest master branch, but haven't
looked into it yet.
Start server:
{code:java}
$IMPALA_HOME/bin/start-impala-cluster.py
--impalad_args='--tuple_cache="/impala/Impala/tmp:5GB"
--tuple_cache_debug_dump_dir="/impala/Impala/tmp"
--default_query_options="enable_tuple_cache=true,enable_tuple_cache_verification=true"'
-s 1{code}
Run below query twice:
{code:java}
use tpcds_parquet;
with frequent_ss_items as (
select substr(i_item_desc,1,30) itemdesc, i_item_sk item_sk, d_date solddate,
count(*) cnt
from store_sales, date_dim, item
where ss_sold_date_sk = d_date_sk
and ss_item_sk = i_item_sk
and d_year in (2000, 2001, 2002, 2003)
group by substr(i_item_desc,1,30), i_item_sk, d_date
having count(*) > 4
),
max_store_sales as (
select max(csales) tpcds_cmax
from (
select c_customer_sk, sum(ss_quantity * ss_sales_price) csales
from store_sales, customer, date_dim
where ss_customer_sk = c_customer_sk
and ss_sold_date_sk = d_date_sk
and d_year in (2000, 2001, 2002, 2003)
group by c_customer_sk
) x
),
best_ss_customer as (
select c_customer_sk, sum(ss_quantity * ss_sales_price) ssales
from store_sales, customer
where ss_customer_sk = c_customer_sk
group by c_customer_sk
having sum(ss_quantity * ss_sales_price) > (50 / 100.0) * (select * from
max_store_sales)
)
select sum(sales)
from (
select cs_quantity * cs_list_price sales
from catalog_sales, date_dim
where d_year = 2000
and d_moy = 2
and cs_sold_date_sk = d_date_sk
and cs_item_sk in (select item_sk from frequent_ss_items)
and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer)
union all
select ws_quantity * ws_list_price sales
from web_sales, date_dim
where d_year = 2000
and d_moy = 2
and ws_sold_date_sk = d_date_sk
and ws_item_sk in (select item_sk from frequent_ss_items)
and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer)
) y
limit 100;
{code}
Results:
{code:java}
2025-04-14 11:44:41 [Exception] ERROR: Query 0b4be12751fac76e:bbf61da300000000
failed:
Inconsistent tuple cache found: Result '[(Tomorrow economic others shoul 14041
2001-09-30 1)]' of file
'/impala/Impala/tmp/tuple-cache-debug-dump/026106f7f3c533d8f3c73572639a3bda_3896506291/0b4be12751fac76e:bbf61da300000019_119.bad'
doesn't exist in the reference file:
'/impala/Impala/tmp/tuple-cache-debug-dump/026106f7f3c533d8f3c73572639a3bda_3896506291/0b4be12751fac76e:bbf61da300000019_119_8a42acc001bfc4b3:eba01ff800000019_119_ref.bad'.
{code}
was (Author: baggio000):
It seems that I can repro the issue with the latest master branch, but haven't
looked into it yet.
Start server:
{code:java}
$IMPALA_HOME/bin/start-impala-cluster.py
--impalad_args='--tuple_cache="/impala/Impala/tmp:5GB"
--tuple_cache_debug_dump_dir="/impala/Impala/tmp"
--default_query_options="enable_tuple_cache=true,enable_tuple_cache_verification=true"'
-s 1{code}
Run below two queries twice in sequence:
{code:java}
use tpcds_parquet;
with frequent_ss_items as (
select substr(i_item_desc,1,30) itemdesc, i_item_sk item_sk, d_date solddate,
count(*) cnt
from store_sales, date_dim, item
where ss_sold_date_sk = d_date_sk
and ss_item_sk = i_item_sk
and d_year in (2000, 2001, 2002, 2003)
group by substr(i_item_desc,1,30), i_item_sk, d_date
having count(*) > 4
),
max_store_sales as (
select max(csales) tpcds_cmax
from (
select c_customer_sk, sum(ss_quantity * ss_sales_price) csales
from store_sales, customer, date_dim
where ss_customer_sk = c_customer_sk
and ss_sold_date_sk = d_date_sk
and d_year in (2000, 2001, 2002, 2003)
group by c_customer_sk
) x
),
best_ss_customer as (
select c_customer_sk, sum(ss_quantity * ss_sales_price) ssales
from store_sales, customer
where ss_customer_sk = c_customer_sk
group by c_customer_sk
having sum(ss_quantity * ss_sales_price) > (50 / 100.0) * (select * from
max_store_sales)
)
select c_last_name, c_first_name, sales
from (
select c_last_name, c_first_name, sum(cs_quantity * cs_list_price) sales
from catalog_sales, customer, date_dim
where d_year = 2000
and d_moy = 2
and cs_sold_date_sk = d_date_sk
and cs_item_sk in (select item_sk from frequent_ss_items)
and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer)
and cs_bill_customer_sk = c_customer_sk
group by c_last_name, c_first_name
union all
select c_last_name, c_first_name, sum(ws_quantity * ws_list_price) sales
from web_sales, customer, date_dim
where d_year = 2000
and d_moy = 2
and ws_sold_date_sk = d_date_sk
and ws_item_sk in (select item_sk from frequent_ss_items)
and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer)
and ws_bill_customer_sk = c_customer_sk
group by c_last_name, c_first_name
) y
order by c_last_name, c_first_name, sales
limit 100;
with frequent_ss_items as (
select substr(i_item_desc,1,30) itemdesc, i_item_sk item_sk, d_date solddate,
count(*) cnt
from store_sales, date_dim, item
where ss_sold_date_sk = d_date_sk
and ss_item_sk = i_item_sk
and d_year in (2000, 2001, 2002, 2003)
group by substr(i_item_desc,1,30), i_item_sk, d_date
having count(*) > 4
),
max_store_sales as (
select max(csales) tpcds_cmax
from (
select c_customer_sk, sum(ss_quantity * ss_sales_price) csales
from store_sales, customer, date_dim
where ss_customer_sk = c_customer_sk
and ss_sold_date_sk = d_date_sk
and d_year in (2000, 2001, 2002, 2003)
group by c_customer_sk
) x
),
best_ss_customer as (
select c_customer_sk, sum(ss_quantity * ss_sales_price) ssales
from store_sales, customer
where ss_customer_sk = c_customer_sk
group by c_customer_sk
having sum(ss_quantity * ss_sales_price) > (50 / 100.0) * (select * from
max_store_sales)
)
select sum(sales)
from (
select cs_quantity * cs_list_price sales
from catalog_sales, date_dim
where d_year = 2000
and d_moy = 2
and cs_sold_date_sk = d_date_sk
and cs_item_sk in (select item_sk from frequent_ss_items)
and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer)
union all
select ws_quantity * ws_list_price sales
from web_sales, date_dim
where d_year = 2000
and d_moy = 2
and ws_sold_date_sk = d_date_sk
and ws_item_sk in (select item_sk from frequent_ss_items)
and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer)
) y
limit 100;
{code}
Results:
{code:java}
2025-04-14 11:32:35 [Exception] ERROR: Query 6f49d6e7f07b59f9:0b1573c500000000
failed:
Inconsistent tuple cache found: Result '[(Steady, responsible controls h 13041
2001-12-16 1)]' of file
'/impala/Impala/tmp/tuple-cache-debug-dump/cdaeff0d6c317810492cba1d369ee7d5_3896506291/6f49d6e7f07b59f9:0b1573c50000000b_104.bad'
doesn't exist in the reference file:
'/impala/Impala/tmp/tuple-cache-debug-dump/cdaeff0d6c317810492cba1d369ee7d5_3896506291/6f49d6e7f07b59f9:0b1573c50000000b_104_8049b4cb6d5699c1:d62a35960000000b_104_ref.bad'.
{code}
> test_tuple_cache_tpc_queries.py intermittently shows errors for TPC-DS queries
> ------------------------------------------------------------------------------
>
> Key: IMPALA-13964
> URL: https://issues.apache.org/jira/browse/IMPALA-13964
> Project: IMPALA
> Issue Type: Task
> Components: Backend, Frontend
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Priority: Critical
>
> test_tuple_cache_tpc_queries.py failed with some tuple cache correctness
> verification errors on some TPC-DS queries. For example:
> {noformat}
> query_test.test_tuple_cache_tpc_queries.TestTupleCacheTpcdsQuery.test_tpcds[protocol:
> beeswax | table_format: parquet/none | exec_option: {'test_replan': 1,
> 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0}-0-tpcds-decimal_v2-q72]
> E Inconsistent tuple cache found: Result '[(11581 80525 327 2452547 2452588
> 197 145263 69)]' of file
> '/data/jenkins/workspace/tmp/impala-tuplecache-debugdump-2/tuple-cache-debug-dump/1bc6486bcce556626e0e1705bd7f9578_3685260755/314cc074d3b8c64c:c66d4d0300000003_37.bad'
> doesn't exist in the reference file:
> '/data/jenkins/workspace/tmp/impala-tuplecache-debugdump-2/tuple-cache-debug-dump/1bc6486bcce556626e0e1705bd7f9578_3685260755/314cc074d3b8c64c:c66d4d0300000003_37_2840c3f9468fae3d:ac5f338400000003_37_ref.bad'.{noformat}
> This showed up in a nightly job. There were also failures for Q72, Q97,
> Q23-1, Q23-2. This does not reproduce on my development machine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]