[
https://issues.apache.org/jira/browse/IMPALA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-4317.
-----------------------------------
Resolution: Cannot Reproduce
> Single Overloaded Impalad Causes Entire Cluster to Hang
> -------------------------------------------------------
>
> Key: IMPALA-4317
> URL: https://issues.apache.org/jira/browse/IMPALA-4317
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.5.0
> Environment: Enterprise CDH 5.7.0, Parcels
> impalad version 2.5.0-cdh5.7.0 RELEASE (build
> ad3f5adabedf56fe6bd9eea39147c067cc552703)
> Reporter: Scott Wallace
> Priority: Major
> Attachments: cached_clients.png, health.png, load.png, queries.png,
> threads.png, worker23.png
>
>
> Occasionally we experience heavy load on a single impalad host. This leads to
> the entire cluster to hang and prevents any impala queries from being able to
> execute.
> Here's what we observe:
> -load increases on a single impalad
> -query throughput across the entire impala cluster drops and we cannot get
> any queries to execute
> -running threads continues to increase until we restart the impala service
> -in the impalad logs we see errors connecting to the unhealthy host. Example:
> Couldn't open transport for
> ux-reporting-engine-worker-23-prod-us-east-1a:22000 (connect() failed:
> Connection timed out)
> Questions:
> Why does the entire Impala service become unstable due to the health of a
> single impalad?
> Theoretically, shouldn't the impala statestore prevent the single impalad
> host from being used and allow queries to be processed by healthy nodes?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]