[jira] [Comment Edited] (IMPALA-13024) Several tests timeout waiting for admission

Csaba Ringhofer (Jira) Sun, 21 Apr 2024 01:16:06 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839337#comment-17839337
 ]


Csaba Ringhofer edited comment on IMPALA-13024 at 4/21/24 8:15 AM:
-------------------------------------------------------------------

>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:

Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; -- 

Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 60000ms in pool default-pool. 
Queued reason: Not enough admission control slots available on host 
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use. 
Additional Details: Not Applicable

UPDATE:
I understand now what is happening: the limit is only enforced on coordinator 
only queries.
While "select * from alltypestiny" failed, the much larger "select * from 
alltypes" could be run without issues. The reason is that the former query runs 
on a single node.

>From impalad.INFO:
"0421 10:10:57.505287 1586078 admission-controller.cc:1962] Trying to admit 
id=91442a9fa1d2512d:db5337c200000000 in pool_name=default-pool 
executor_group_name=empty group (using coordinator only) 
per_host_mem_estimate=20.00 MB dedicated_coord_mem_estimate=120.00 MB 
max_requests=-1 max_queued=200 max_mem=-1.00 B is_trivial_query=false
I0421 10:10:57.505345 1586078 admission-controller.cc:1971] Stats: 
agg_num_running=1, agg_num_queued=1, agg_mem_reserved=4.02 MB,  
local_host(local_mem_admitted=516.57 MB, local_trivial_running=0, 
num_admitted_running=1, num_queued=1, backend_mem_reserved=4.02 MB, 
topN_query_stats: queries=[d84f2a7efee0998a:45ac120600000000], 
total_mem_consumed=4.02 MB, fraction_of_pool_total_mem=1; pool_level_stats: 
num_running=1, min=4.02 MB, max=4.02 MB, pool_total_mem=4.02 MB, 
average_per_query=4.02 MB)
I0421 10:10:57.505407 1586078 admission-controller.cc:2227] Could not dequeue 
query id=91442a9fa1d2512d:db5337c200000000 reason: Not enough admission control 
slots available on host csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 
are already in use."



was (Author: csringhofer):
>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:

Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; -- 

Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 60000ms in pool default-pool. 
Queued reason: Not enough admission control slots available on host 
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use. 
Additional Details: Not Applicable


> Several tests timeout waiting for admission
> -------------------------------------------
>
>                 Key: IMPALA-13024
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13024
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Csaba Ringhofer
>            Priority: Critical
>
> A bunch of seemingly unrelated tests failed with the following message:
> Example: 
> query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
>  beeswax | exec_option: {'mt_dop': 1, 'debug_action': None, 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> {code}
> ImpalaBeeswaxException: E    Query aborted:Admission for query exceeded 
> timeout 60000ms in pool default-pool. Queued reason: Not enough admission 
> control slots available on host ... . Needed 1 slots but 18/16 are already in 
> use. Additional Details: Not Applicable
> {code}
> This happened in an ASAN build. Another test also failed which may be related 
> to the cause:
> custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
>  
> {code}
> Timeout: query 'e1410add778cd7b0:c40812b900000000' did not reach one of the 
> expected states [4], last known state 5
> {code}
> test_queue_reasons_slots seems to be know flaky test: IMPALA-10338



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (IMPALA-13024) Several tests timeout waiting for admission

Reply via email to