[ 
https://issues.apache.org/jira/browse/IMPALA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4317.
-----------------------------------
    Resolution: Cannot Reproduce

> Single Overloaded Impalad Causes Entire Cluster to Hang
> -------------------------------------------------------
>
>                 Key: IMPALA-4317
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4317
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.5.0
>         Environment: Enterprise CDH 5.7.0, Parcels
> impalad version 2.5.0-cdh5.7.0 RELEASE (build 
> ad3f5adabedf56fe6bd9eea39147c067cc552703)
>            Reporter: Scott Wallace
>            Priority: Major
>         Attachments: cached_clients.png, health.png, load.png, queries.png, 
> threads.png, worker23.png
>
>
> Occasionally we experience heavy load on a single impalad host. This leads to 
> the entire cluster to hang and prevents any impala queries from being able to 
> execute.
> Here's what we observe:
> -load increases on a single impalad
> -query throughput across the entire impala cluster drops and we cannot get 
> any queries to execute
> -running threads continues to increase until we restart the impala service
> -in the impalad logs we see errors connecting to the unhealthy host. Example: 
> Couldn't open transport for 
> ux-reporting-engine-worker-23-prod-us-east-1a:22000 (connect() failed: 
> Connection timed out)
> Questions:
> Why does the entire Impala service become unstable due to the health of a 
> single impalad?
> Theoretically, shouldn't the impala statestore prevent the single impalad 
> host from being used and allow queries to be processed by healthy nodes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to