[
https://issues.apache.org/jira/browse/IMPALA-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839337#comment-17839337
]
Csaba Ringhofer edited comment on IMPALA-13024 at 4/21/24 8:15 AM:
-------------------------------------------------------------------
>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:
Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; --
Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 60000ms in pool default-pool.
Queued reason: Not enough admission control slots available on host
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use.
Additional Details: Not Applicable
UPDATE:
I understand now what is happening: the limit is only enforced on coordinator
only queries.
While "select * from alltypestiny" failed, the much larger "select * from
alltypes" could be run without issues. The reason is that the former query runs
on a single node.
>From impalad.INFO:
"0421 10:10:57.505287 1586078 admission-controller.cc:1962] Trying to admit
id=91442a9fa1d2512d:db5337c200000000 in pool_name=default-pool
executor_group_name=empty group (using coordinator only)
per_host_mem_estimate=20.00 MB dedicated_coord_mem_estimate=120.00 MB
max_requests=-1 max_queued=200 max_mem=-1.00 B is_trivial_query=false
I0421 10:10:57.505345 1586078 admission-controller.cc:1971] Stats:
agg_num_running=1, agg_num_queued=1, agg_mem_reserved=4.02 MB,
local_host(local_mem_admitted=516.57 MB, local_trivial_running=0,
num_admitted_running=1, num_queued=1, backend_mem_reserved=4.02 MB,
topN_query_stats: queries=[d84f2a7efee0998a:45ac120600000000],
total_mem_consumed=4.02 MB, fraction_of_pool_total_mem=1; pool_level_stats:
num_running=1, min=4.02 MB, max=4.02 MB, pool_total_mem=4.02 MB,
average_per_query=4.02 MB)
I0421 10:10:57.505407 1586078 admission-controller.cc:2227] Could not dequeue
query id=91442a9fa1d2512d:db5337c200000000 reason: Not enough admission control
slots available on host csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24
are already in use."
was (Author: csringhofer):
>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:
Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; --
Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 60000ms in pool default-pool.
Queued reason: Not enough admission control slots available on host
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use.
Additional Details: Not Applicable
> Several tests timeout waiting for admission
> -------------------------------------------
>
> Key: IMPALA-13024
> URL: https://issues.apache.org/jira/browse/IMPALA-13024
> Project: IMPALA
> Issue Type: Bug
> Reporter: Csaba Ringhofer
> Priority: Critical
>
> A bunch of seemingly unrelated tests failed with the following message:
> Example:
> query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
> beeswax | exec_option: {'mt_dop': 1, 'debug_action': None,
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none]
> {code}
> ImpalaBeeswaxException: E Query aborted:Admission for query exceeded
> timeout 60000ms in pool default-pool. Queued reason: Not enough admission
> control slots available on host ... . Needed 1 slots but 18/16 are already in
> use. Additional Details: Not Applicable
> {code}
> This happened in an ASAN build. Another test also failed which may be related
> to the cause:
> custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
>
> {code}
> Timeout: query 'e1410add778cd7b0:c40812b900000000' did not reach one of the
> expected states [4], last known state 5
> {code}
> test_queue_reasons_slots seems to be know flaky test: IMPALA-10338
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]