This is an automated email from the ASF dual-hosted git repository.
granthenke pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/master by this push:
new 52c5935 [test] Fix ASAN failure when Hive Metastore connections are
retried.
52c5935 is described below
commit 52c59352f8c06be359edb95d2a68ad06252e2031
Author: Grant Henke <[email protected]>
AuthorDate: Thu Feb 20 13:35:33 2020 -0600
[test] Fix ASAN failure when Hive Metastore connections are retried.
I saw an ASAN test failure that occured when there was a failure
to connect to the Hive Metastore. This may not fix the connection
issue, but it fixes the unsafe ASAN failure and allows the test to
continue.
Below is a sample of the log:
W0220 18:46:15.548344 18002 client.h:351] Failed to connect to Hive
Metastore (127.0.0.1:45269): Network error: failed to open Hive Metastore
connection: socket open() error: Connection refused
I0220 18:46:16.549294 18002 client.cc:56] TSocket::open() error on socket
(after THRIFT_POLL) <Host: 127.0.0.1 Port: 45269>Connection refused
W0220 18:46:16.549479 18002 client.h:351] Failed to connect to Hive
Metastore (127.0.0.1:45269): Network error: failed to open Hive Metastore
connection: socket open() error: Connection refused
/home/jenkins-slave/workspace/kudu-master/0/src/kudu/thrift/client.h:204:3:
runtime error: left shift of 100 by 26 places cannot be represented in type
'int'
#0 0x7f527299d77b in
kudu::thrift::HaClient<kudu::hms::HmsClient>::Execute(std::function<kudu::Status
(kudu::hms::HmsClient*)>)::'lambda'()::operator()() const
/home/jenkins-slave/workspace/kudu-master/0/src/kudu/thrift/client.h:204:3
#1 0x7f526e44ead7 in boost::function0<void>::operator()() const
/home/jenkins-slave/workspace/kudu-master/0/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:770:14
#2 0x7f526b6f21f4 in kudu::ThreadPool::DispatchThread()
/home/jenkins-slave/workspace/kudu-master/0/src/kudu/util/threadpool.cc:685:22
#3 0x7f526b70c992 in boost::_bi::bind_t<void, boost::_mfi::mf0<void,
kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> >
>::operator()()
/home/jenkins-slave/workspace/kudu-master/0/thirdparty/installed/uninstrumented/include/boost/bind/bind.hpp:1222:16
#4 0x7f526e44ead7 in boost::function0<void>::operator()() const
/home/jenkins-slave/workspace/kudu-master/0/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:770:14
#5 0x7f526b6d812a in kudu::Thread::SuperviseThread(void*)
/home/jenkins-slave/workspace/kudu-master/0/src/kudu/util/thread.cc:675:3
#6 0x7f5267917183 in start_thread
/build/eglibc-SvCtMH/eglibc-2.19/nptl/pthread_create.c:312
#7 0x7f526742dffc in clone sysdeps/unix/sysv/linux/x86_64/clone.S:111
Change-Id: I1282ad36027b314d090e5a2dffdc3854002af761
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
/home/jenkins-slave/workspace/kudu-master/0/src/kudu/thrift/client.h:204:3 in
Reviewed-on: http://gerrit.cloudera.org:8080/15256
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
---
src/kudu/thrift/client.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/kudu/thrift/client.h b/src/kudu/thrift/client.h
index 2e684e7..867cf2b 100644
--- a/src/kudu/thrift/client.h
+++ b/src/kudu/thrift/client.h
@@ -31,6 +31,7 @@
#include "kudu/gutil/port.h"
#include "kudu/gutil/ref_counted.h"
#include "kudu/gutil/strings/substitute.h"
+#include "kudu/rpc/rpc.h"
#include "kudu/thrift/ha_client_metrics.h"
#include "kudu/util/async_util.h"
#include "kudu/util/metrics.h"
@@ -48,6 +49,9 @@ class TProtocol;
} // namespace apache
namespace kudu {
+
+using rpc::ComputeExponentialBackoff;
+
namespace thrift {
// Options for a Thrift client connection.
@@ -251,14 +255,10 @@ Status
HaClient<Service>::Execute(std::function<Status(Service*)> task) {
if (PREDICT_TRUE(metrics_)) {
metrics_->reconnections_failed->Increment();
}
- // Reconnect failed; retry with exponential backoff capped at 10s and
- // fail the task. We don't bother with jitter here because only the
- // leader master should be attempting this in any given period per
- // cluster.
+ // Reconnect failed; retry with exponential backoff and fail the
task.
consecutive_reconnect_failures_++;
reconnect_after_ = MonoTime::Now() +
- std::min(MonoDelta::FromMilliseconds(100 <<
consecutive_reconnect_failures_),
- MonoDelta::FromSeconds(10));
+ ComputeExponentialBackoff(consecutive_reconnect_failures_);
reconnect_failure_ = std::move(reconnect_status);
return callback(reconnect_failure_);
}