[jira] [Created] (IMPALA-9425) Statestore may fail to report when an impalad has failed

Thomas Tauber-Marshall (Jira) Tue, 25 Feb 2020 16:36:53 -0800

Thomas Tauber-Marshall created IMPALA-9425:
----------------------------------------------


             Summary: Statestore may fail to report when an impalad has failed
                 Key: IMPALA-9425
                 URL: https://issues.apache.org/jira/browse/IMPALA-9425
             Project: IMPALA
          Issue Type: Bug
          Components: Distributed Exec
    Affects Versions: Impala 3.4.0
            Reporter: Thomas Tauber-Marshall
            Assignee: Thomas Tauber-Marshall


If an impalad fails and another is restarted at the same host:port combination 
quickly, the statestore may fail to report to the coordinators that the impalad 
went down.

The reason for this is that in the cluster membership topic, impalads are keyed 
by their statestore subscriber id, which is "impalad@host:port". If the new 
impalad registers itself before a topic update has been generated for a 
particular coordinator, the statestore has no way of knowing that the 
particular key was deleted and then re-added since the last update.

The result is that queries that were running on the impalad that failed may not 
be cancelled by the coordinator until they pass the unresponsive backend 
timeout, which by default is ~12 minutes.

I propose as a solution that we add a concept of uuids for impalads, where each 
impalad will generate its own uuid on startup. This allows us to differentiate 
between different impalads running at the same host:port combination.

It can also be used to simplify some logic in the scheduler and 
ExecutorGroup/ExecutorBlacklist etc. where we currently have data structures 
containing info about impalads that are keyed off host/port combinations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (IMPALA-9425) Statestore may fail to report when an impalad has failed

Reply via email to