Repository: kudu Updated Branches: refs/heads/master ecd67486b -> 6de378296
rpc: hook up a callback for libev fatal errors In troubleshooting a recent cluster issue, I found that the daemon had run out of file descriptors. This caused libev to abort(), but the error message wasn't anywhere obvious since the default implementation just writes to stderr. Piping this through to a GLog FATAL is more likely to result in an obvious log message. It's difficult to write an automated test for this, but I tested by setting my ulimit to 10 and running rpc-test. This resulted in: F0809 19:03:39.882194 3358 reactor.cc:108] LibEV fatal error: (libev) error creating signal/async pipe: Too many open files [24] Change-Id: I5fa77237a40f43d6bb82e9f1ceecd31d52268f9d Reviewed-on: http://gerrit.cloudera.org:8080/7633 Tested-by: Kudu Jenkins Reviewed-by: Matthew Jacobs <m...@cloudera.com> Reviewed-by: David Ribeiro Alves <davidral...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/2a99bb3e Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/2a99bb3e Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/2a99bb3e Branch: refs/heads/master Commit: 2a99bb3e5864ae4ae4dd6d2dcdff557fed81aa1d Parents: ecd6748 Author: Todd Lipcon <t...@apache.org> Authored: Wed Aug 9 19:01:07 2017 -0700 Committer: Todd Lipcon <t...@apache.org> Committed: Thu Aug 10 18:47:58 2017 +0000 ---------------------------------------------------------------------- src/kudu/rpc/reactor.cc | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/2a99bb3e/src/kudu/rpc/reactor.cc ---------------------------------------------------------------------- diff --git a/src/kudu/rpc/reactor.cc b/src/kudu/rpc/reactor.cc index cf63672..b3b7ea2 100644 --- a/src/kudu/rpc/reactor.cc +++ b/src/kudu/rpc/reactor.cc @@ -96,6 +96,22 @@ Status ShutdownError(bool aborted) { Status::Aborted(msg, "", ESHUTDOWN) : Status::ServiceUnavailable(msg, "", ESHUTDOWN); } + +// Callback for libev fatal errors (eg running out of file descriptors). +// Unfortunately libev doesn't plumb these back through to the caller, but +// instead just expects the callback to abort. +// +// This implementation is slightly preferable to the built-in one since +// it uses a FATAL log message instead of printing to stderr, which might +// not end up anywhere useful in a daemonized context. +void LibevSysErr(const char* msg) throw() { + PLOG(FATAL) << "LibEV fatal error: " << msg; +} + +void DoInitLibEv() { + ev::set_syserr_cb(LibevSysErr); +} + } // anonymous namespace ReactorThread::ReactorThread(Reactor *reactor, const MessengerBuilder& bld) @@ -620,6 +636,8 @@ Reactor::Reactor(shared_ptr<Messenger> messenger, name_(StringPrintf("%s_R%03d", messenger_->name().c_str(), index)), closing_(false), thread_(this, bld) { + static std::once_flag libev_once; + std::call_once(libev_once, DoInitLibEv); } Status Reactor::Init() {