[ 
https://issues.apache.org/jira/browse/IMPALA-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rawat updated IMPALA-12039:
------------------------------------
    Description: 
There is a race condition between admission controller and 
executors/executor-group deletion. if a query comes in it could be admitted to 
just deleted executor group and the query fails.
{code:java}
I0330 06:05:25.600728  9398 admission-controller.cc:1941] 
3c4f9069df52951e:0b97d92800000000] Trying to admit 
id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default 
executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB 
dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200 
max_mem=48828.12 GB is_trivial_query=false

I0330 06:05:25.600769  9398 admission-controller.cc:1950] 
3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, agg_num_queued=0, 
agg_mem_reserved=0,  local_host(local_mem_admitted=0, local_trivial_running=0, 
num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: 
queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0; 
pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0, 
average_per_query=0)

I0330 06:05:25.600816  9398 admission-controller.cc:1300] 
3c4f9069df52951e:0b97d92800000000] Admitting query 
id=3c4f9069df52951e:0b97d92800000000

I0330 06:05:25.600883  9398 impala-server.cc:2231] 
3c4f9069df52951e:0b97d92800000000] Registering query locations

I0330 06:05:25.600898  9398 coordinator.cc:151] 
3c4f9069df52951e:0b97d92800000000] Exec() 
query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from 
test_a9a41a5.t where id + random() < sleep(10000)

I0330 06:05:25.601054  9398 coordinator.cc:476] 
3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for 
query_id=3c4f9069df52951e:0b97d92800000000

I0330 06:05:25.601359   124 control-service.cc:148] 
3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances(): 
query_id=3c4f9069df52951e:0b97d92800000000 
coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000
 #instances=1

I0330 06:05:25.601604   117 kudu-status-util.h:55] Exec() rpc failed: Network 
error: Client connection negotiation failed: client connection to 
192.168.112.16:27010: connect: Connection refused (error 111)

E0330 06:05:25.601706   117 coordinator-backend-state.cc:190] 
ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed: 
Exec() rpc failed: Network error: Client connection negotiation failed: client 
connection to 192.168.112.16:27010: connect: Connection refused (error 111) 
{code}
 

 

  was:
IMPALA-11891 added support for deleting executor groups if it's empty. However, 
there is a race condition here where if a query comes in it could be admitted 
to just deleted executor group and the query fails.
{code:java}
I0330 06:05:25.600728  9398 admission-controller.cc:1941] 
3c4f9069df52951e:0b97d92800000000] Trying to admit 
id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default 
executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB 
dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200 
max_mem=48828.12 GB is_trivial_query=false

I0330 06:05:25.600769  9398 admission-controller.cc:1950] 
3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, agg_num_queued=0, 
agg_mem_reserved=0,  local_host(local_mem_admitted=0, local_trivial_running=0, 
num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: 
queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0; 
pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0, 
average_per_query=0)

I0330 06:05:25.600816  9398 admission-controller.cc:1300] 
3c4f9069df52951e:0b97d92800000000] Admitting query 
id=3c4f9069df52951e:0b97d92800000000

I0330 06:05:25.600883  9398 impala-server.cc:2231] 
3c4f9069df52951e:0b97d92800000000] Registering query locations

I0330 06:05:25.600898  9398 coordinator.cc:151] 
3c4f9069df52951e:0b97d92800000000] Exec() 
query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from 
test_a9a41a5.t where id + random() < sleep(10000)

I0330 06:05:25.601054  9398 coordinator.cc:476] 
3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for 
query_id=3c4f9069df52951e:0b97d92800000000

I0330 06:05:25.601359   124 control-service.cc:148] 
3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances(): 
query_id=3c4f9069df52951e:0b97d92800000000 
coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000
 #instances=1

I0330 06:05:25.601604   117 kudu-status-util.h:55] Exec() rpc failed: Network 
error: Client connection negotiation failed: client connection to 
192.168.112.16:27010: connect: Connection refused (error 111)

E0330 06:05:25.601706   117 coordinator-backend-state.cc:190] 
ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed: 
Exec() rpc failed: Network error: Client connection negotiation failed: client 
connection to 192.168.112.16:27010: connect: Connection refused (error 111) 
{code}
In the past the empty executor group would have been unhealthy and admission 
controller would've queued the incoming query.

 

 


> Potential Race condition between executor group deletion and admission 
> controller
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-12039
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12039
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Abhishek Rawat
>            Priority: Critical
>
> There is a race condition between admission controller and 
> executors/executor-group deletion. if a query comes in it could be admitted 
> to just deleted executor group and the query fails.
> {code:java}
> I0330 06:05:25.600728  9398 admission-controller.cc:1941] 
> 3c4f9069df52951e:0b97d92800000000] Trying to admit 
> id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default 
> executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB 
> dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200 
> max_mem=48828.12 GB is_trivial_query=false
> I0330 06:05:25.600769  9398 admission-controller.cc:1950] 
> 3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, 
> agg_num_queued=0, agg_mem_reserved=0,  local_host(local_mem_admitted=0, 
> local_trivial_running=0, num_admitted_running=0, num_queued=0, 
> backend_mem_reserved=0, topN_query_stats: 
> queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0; 
> pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0, 
> average_per_query=0)
> I0330 06:05:25.600816  9398 admission-controller.cc:1300] 
> 3c4f9069df52951e:0b97d92800000000] Admitting query 
> id=3c4f9069df52951e:0b97d92800000000
> I0330 06:05:25.600883  9398 impala-server.cc:2231] 
> 3c4f9069df52951e:0b97d92800000000] Registering query locations
> I0330 06:05:25.600898  9398 coordinator.cc:151] 
> 3c4f9069df52951e:0b97d92800000000] Exec() 
> query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from 
> test_a9a41a5.t where id + random() < sleep(10000)
> I0330 06:05:25.601054  9398 coordinator.cc:476] 
> 3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for 
> query_id=3c4f9069df52951e:0b97d92800000000
> I0330 06:05:25.601359   124 control-service.cc:148] 
> 3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances(): 
> query_id=3c4f9069df52951e:0b97d92800000000 
> coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000
>  #instances=1
> I0330 06:05:25.601604   117 kudu-status-util.h:55] Exec() rpc failed: Network 
> error: Client connection negotiation failed: client connection to 
> 192.168.112.16:27010: connect: Connection refused (error 111)
> E0330 06:05:25.601706   117 coordinator-backend-state.cc:190] 
> ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed: 
> Exec() rpc failed: Network error: Client connection negotiation failed: 
> client connection to 192.168.112.16:27010: connect: Connection refused (error 
> 111) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to