[
https://issues.apache.org/jira/browse/IMPALA-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Rawat updated IMPALA-12039:
------------------------------------
Description:
There is a race condition between admission controller and
executors/executor-group deletion. if a query comes in it could be admitted to
just deleted executor group and the query fails.
{code:java}
I0330 06:05:25.600728 9398 admission-controller.cc:1941]
3c4f9069df52951e:0b97d92800000000] Trying to admit
id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default
executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB
dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200
max_mem=48828.12 GB is_trivial_query=false
I0330 06:05:25.600769 9398 admission-controller.cc:1950]
3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, agg_num_queued=0,
agg_mem_reserved=0, local_host(local_mem_admitted=0, local_trivial_running=0,
num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats:
queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0;
pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0,
average_per_query=0)
I0330 06:05:25.600816 9398 admission-controller.cc:1300]
3c4f9069df52951e:0b97d92800000000] Admitting query
id=3c4f9069df52951e:0b97d92800000000
I0330 06:05:25.600883 9398 impala-server.cc:2231]
3c4f9069df52951e:0b97d92800000000] Registering query locations
I0330 06:05:25.600898 9398 coordinator.cc:151]
3c4f9069df52951e:0b97d92800000000] Exec()
query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from
test_a9a41a5.t where id + random() < sleep(10000)
I0330 06:05:25.601054 9398 coordinator.cc:476]
3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for
query_id=3c4f9069df52951e:0b97d92800000000
I0330 06:05:25.601359 124 control-service.cc:148]
3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances():
query_id=3c4f9069df52951e:0b97d92800000000
coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000
#instances=1
I0330 06:05:25.601604 117 kudu-status-util.h:55] Exec() rpc failed: Network
error: Client connection negotiation failed: client connection to
192.168.112.16:27010: connect: Connection refused (error 111)
E0330 06:05:25.601706 117 coordinator-backend-state.cc:190]
ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed:
Exec() rpc failed: Network error: Client connection negotiation failed: client
connection to 192.168.112.16:27010: connect: Connection refused (error 111)
{code}
was:
IMPALA-11891 added support for deleting executor groups if it's empty. However,
there is a race condition here where if a query comes in it could be admitted
to just deleted executor group and the query fails.
{code:java}
I0330 06:05:25.600728 9398 admission-controller.cc:1941]
3c4f9069df52951e:0b97d92800000000] Trying to admit
id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default
executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB
dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200
max_mem=48828.12 GB is_trivial_query=false
I0330 06:05:25.600769 9398 admission-controller.cc:1950]
3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, agg_num_queued=0,
agg_mem_reserved=0, local_host(local_mem_admitted=0, local_trivial_running=0,
num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats:
queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0;
pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0,
average_per_query=0)
I0330 06:05:25.600816 9398 admission-controller.cc:1300]
3c4f9069df52951e:0b97d92800000000] Admitting query
id=3c4f9069df52951e:0b97d92800000000
I0330 06:05:25.600883 9398 impala-server.cc:2231]
3c4f9069df52951e:0b97d92800000000] Registering query locations
I0330 06:05:25.600898 9398 coordinator.cc:151]
3c4f9069df52951e:0b97d92800000000] Exec()
query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from
test_a9a41a5.t where id + random() < sleep(10000)
I0330 06:05:25.601054 9398 coordinator.cc:476]
3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for
query_id=3c4f9069df52951e:0b97d92800000000
I0330 06:05:25.601359 124 control-service.cc:148]
3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances():
query_id=3c4f9069df52951e:0b97d92800000000
coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000
#instances=1
I0330 06:05:25.601604 117 kudu-status-util.h:55] Exec() rpc failed: Network
error: Client connection negotiation failed: client connection to
192.168.112.16:27010: connect: Connection refused (error 111)
E0330 06:05:25.601706 117 coordinator-backend-state.cc:190]
ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed:
Exec() rpc failed: Network error: Client connection negotiation failed: client
connection to 192.168.112.16:27010: connect: Connection refused (error 111)
{code}
In the past the empty executor group would have been unhealthy and admission
controller would've queued the incoming query.
> Potential Race condition between executor group deletion and admission
> controller
> ---------------------------------------------------------------------------------
>
> Key: IMPALA-12039
> URL: https://issues.apache.org/jira/browse/IMPALA-12039
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Abhishek Rawat
> Priority: Critical
>
> There is a race condition between admission controller and
> executors/executor-group deletion. if a query comes in it could be admitted
> to just deleted executor group and the query fails.
> {code:java}
> I0330 06:05:25.600728 9398 admission-controller.cc:1941]
> 3c4f9069df52951e:0b97d92800000000] Trying to admit
> id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default
> executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB
> dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200
> max_mem=48828.12 GB is_trivial_query=false
> I0330 06:05:25.600769 9398 admission-controller.cc:1950]
> 3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0,
> agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0,
> local_trivial_running=0, num_admitted_running=0, num_queued=0,
> backend_mem_reserved=0, topN_query_stats:
> queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0;
> pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0,
> average_per_query=0)
> I0330 06:05:25.600816 9398 admission-controller.cc:1300]
> 3c4f9069df52951e:0b97d92800000000] Admitting query
> id=3c4f9069df52951e:0b97d92800000000
> I0330 06:05:25.600883 9398 impala-server.cc:2231]
> 3c4f9069df52951e:0b97d92800000000] Registering query locations
> I0330 06:05:25.600898 9398 coordinator.cc:151]
> 3c4f9069df52951e:0b97d92800000000] Exec()
> query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from
> test_a9a41a5.t where id + random() < sleep(10000)
> I0330 06:05:25.601054 9398 coordinator.cc:476]
> 3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for
> query_id=3c4f9069df52951e:0b97d92800000000
> I0330 06:05:25.601359 124 control-service.cc:148]
> 3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances():
> query_id=3c4f9069df52951e:0b97d92800000000
> coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000
> #instances=1
> I0330 06:05:25.601604 117 kudu-status-util.h:55] Exec() rpc failed: Network
> error: Client connection negotiation failed: client connection to
> 192.168.112.16:27010: connect: Connection refused (error 111)
> E0330 06:05:25.601706 117 coordinator-backend-state.cc:190]
> ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed:
> Exec() rpc failed: Network error: Client connection negotiation failed:
> client connection to 192.168.112.16:27010: connect: Connection refused (error
> 111) {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]