[jira] [Resolved] (IMPALA-7305) membership entry for failed impalad gets stuck in statestore due to race between failure detection and update processing

Tim Armstrong (JIRA) Mon, 16 Jul 2018 14:34:00 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Armstrong resolved IMPALA-7305.
-----------------------------------
    Resolution: Fixed

Fixed by this commit, specifically the below part:
{code}
-      Topic* topic = &topic_it->second;
-      for (const TTopicItem& item: update.topic_entries) {
-        subscriber->AddTransientUpdate(update.topic_name, item.key,
-            topic->Put(item.key, item.value, item.deleted));
+      Topic& topic = topic_it->second;
+      // Update the topic and add transient entries separately to avoid 
holding both
+      // locks at the same time and preventing concurrent topic updates.
+      vector<TopicEntry::Version> entry_versions = 
topic.Put(update.topic_entries);
+      if (!subscriber->AddTransientEntries(
+          update.topic_name, update.topic_entries, entry_versions)) {
+        // Subscriber was unregistered - clean up the transient entries.
+        for (int i = 0; i < update.topic_entries.size(); ++i) {
+          topic.DeleteIfVersionsMatch(entry_versions[i], 
update.topic_entries[i].key);
+        }
       }
     }
{code}

Commit b0d3433e36d7942b3e10bddc310287266240810b in impala's branch 
refs/heads/master from Tim Armstrong
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=b0d3433 ]

IMPALA-4953,IMPALA-6437: separate AC/scheduler from catalog topic updates

This adds a set of "prioritized" statestore topics that are small but
are important to deliver in a timely manner. These are delivered more
frequently by a separate thread pool to reduce the window for stale
admission control and scheduling information.

The contract between statestore and subscriber is changed so that the
statestore can send concurrent Update() RPCs for disjoint sets of
topics. This required changes to the subscriber implementation, which
assumed that only one Update RPC would arrive at a time.

It also changes the locking in the statestore so that the prioritized
update threads don't get stuck behind the catalog threads holding
'topic_lock_'. Specifically, it uses a reader-writer lock to protect
modification of the set of topics and a reader-writer lock per topic to
allow the topic data to be read by multiple threads concurrently.

Added metrics to monitor the per-topic update interval.

Testing:
Ran core tests.

Inspected metrics on Impala daemons, saw that membership and request
queue processing times had more samples recorded than the catalog
topic, reflecting the increased frequency.

Ran under thread sanitizer, made sure no data races were reported in
Statestore or StatestoreSubscriber.

Change-Id: Ifc49c2d0f2a5bfad822545616b8c62b4b95dc210
Reviewed-on: http://gerrit.cloudera.org:8080/9123
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins

> membership entry for failed impalad gets stuck in statestore due to race 
> between failure detection and update processing
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-7305
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7305
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, 
> Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>         Attachments: 0001-Repro-CDH-70703.patch
>
>
> I was able to reproduce this bug on a version of Impala pre-IMPALA-4953 with 
> the attached patch that adds a sleep. The patch is a hack and only works on 
> my system (it has a name hardcoded). The trick is to kill the third impala 
> manually while the cluster is starting up.
> Then the system gets stuck in a state where all impalads thing 22002 is alive 
> but the process was actually killed. Running queries fails because they keep 
> getting scheduled on the dead impalad.
> {noformat}
> Known backend(s): 3
> Address       Coordinator     Executor
> tarmstrong-box:22002  true    true
> tarmstrong-box:22001  true    true
> tarmstrong-box:22000  true    true
> {noformat}
> The race seems quite exotic but may be possible if there are intermittent 
> transport errors (causing heartbeats to fail) or if there are delays 
> processing topics, e.g. contending for locks.
> IMPALA-4953 fixes the problem by deleting newly-added transient entries if 
> the subscriber got unregistered while the statestore was processing an update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (IMPALA-7305) membership entry for failed impalad gets stuck in statestore due to race between failure detection and update processing

Reply via email to