[jira] [Commented] (IMPALA-4317) Single Overloaded Impalad Causes Entire Cluster to Hang

Tim Armstrong (Jira) Fri, 19 Jun 2020 13:15:16 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140802#comment-17140802
 ]


Tim Armstrong commented on IMPALA-4317:
---------------------------------------

This is quite old and it sounds like the problem was likely connected to the 
old RPC stack and thread counts. There is some additional work to blacklist 
unhealthy nodes that is relevant.

We can't detect *in general* slowness of a single impalad and blacklist based 
on that, but I think we would have fixed this particular scenario. So I'll 
close out this JIRA.

> Single Overloaded Impalad Causes Entire Cluster to Hang
> -------------------------------------------------------
>
>                 Key: IMPALA-4317
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4317
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.5.0
>         Environment: Enterprise CDH 5.7.0, Parcels
> impalad version 2.5.0-cdh5.7.0 RELEASE (build 
> ad3f5adabedf56fe6bd9eea39147c067cc552703)
>            Reporter: Scott Wallace
>            Priority: Major
>         Attachments: cached_clients.png, health.png, load.png, queries.png, 
> threads.png, worker23.png
>
>
> Occasionally we experience heavy load on a single impalad host. This leads to 
> the entire cluster to hang and prevents any impala queries from being able to 
> execute.
> Here's what we observe:
> -load increases on a single impalad
> -query throughput across the entire impala cluster drops and we cannot get 
> any queries to execute
> -running threads continues to increase until we restart the impala service
> -in the impalad logs we see errors connecting to the unhealthy host. Example: 
> Couldn't open transport for 
> ux-reporting-engine-worker-23-prod-us-east-1a:22000 (connect() failed: 
> Connection timed out)
> Questions:
> Why does the entire Impala service become unstable due to the health of a 
> single impalad?
> Theoretically, shouldn't the impala statestore prevent the single impalad 
> host from being used and allow queries to be processed by healthy nodes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-4317) Single Overloaded Impalad Causes Entire Cluster to Hang

Reply via email to