[ 
https://issues.apache.org/jira/browse/IMPALA-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945159#comment-17945159
 ] 

Joe McDonnell commented on IMPALA-13964:
----------------------------------------

The issue is coming from caching locations immediately above streaming 
pre-aggregation with grouping. These are not deterministic, because they rely 
on the finalization phase to handle. Suppose we are summing X group by Y. With 
this data arriving at a single node:
{noformat}
Y=A, X=1
Y=B, X=2
Y=A, X=3
Y=B, Y=4
Y=A, X=5
Y=B, X=6{noformat}
The streaming preagg for that node is allowed to produce different combinations 
as long as they  add to (Y=A, sum(X)=9) and (Y=B, sum(X)=12). e.g.
{noformat}
(Y=A, sum(X)=9), (Y=B, sum(X)=12)
(Y=A, sum(X)=1), (Y=B, sum(X)=2), (Y=A, sum(X)=8), (Y=B, sum(X)=10)
(Y=A, sum(X)=4), (Y=B, sum(X)=6), (Y=A, sum(X)=5), (Y=B, sum(X)=6)
(Y=A, sum(X)=1), (Y=B, sum(X)=2), (Y=A, sum(X)=3), (Y=B, sum(X)=4), (Y=A, 
sum(X)=5), (Y=B, sum(X)=6){noformat}
In practice, I think the main way this can happen is with memory pressure. 
Without more sophisticated correctness checking, these locations won't be 
deterministic.

> test_tuple_cache_tpc_queries.py intermittently shows errors for TPC-DS queries
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-13964
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13964
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend, Frontend
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>              Labels: broken-build
>
> test_tuple_cache_tpc_queries.py failed with some tuple cache correctness 
> verification errors on some TPC-DS queries. For example:
> {noformat}
> query_test.test_tuple_cache_tpc_queries.TestTupleCacheTpcdsQuery.test_tpcds[protocol:
>  beeswax | table_format: parquet/none | exec_option: {'test_replan': 1, 
> 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0}-0-tpcds-decimal_v2-q72]
> E   Inconsistent tuple cache found: Result '[(11581 80525 327 2452547 2452588 
> 197 145263 69)]' of file 
> '/data/jenkins/workspace/tmp/impala-tuplecache-debugdump-2/tuple-cache-debug-dump/1bc6486bcce556626e0e1705bd7f9578_3685260755/314cc074d3b8c64c:c66d4d0300000003_37.bad'
>  doesn't exist in the reference file: 
> '/data/jenkins/workspace/tmp/impala-tuplecache-debugdump-2/tuple-cache-debug-dump/1bc6486bcce556626e0e1705bd7f9578_3685260755/314cc074d3b8c64c:c66d4d0300000003_37_2840c3f9468fae3d:ac5f338400000003_37_ref.bad'.{noformat}
> This showed up in a nightly job. There were also failures for Q72, Q97, 
> Q23-1, Q23-2. This does not reproduce on my development machine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to