Under some conditions it is possible for a daemon to accept new messages
faster than it can process them.  Since there is no limit on the number
of outstanding messages, if this condition persists the daemon will
consume all available memory/swap on the server.

Fix this by causing the message reader to monitor the bufferlist
memory use, and sleep for a bit if some threshold is exceeded.
Some more sophisticated technique that prevents more demanding clients
from starving less demanding clients may be needed.

Without this patch, for a filesystem with a single osd, I was able to
reliably cause the osd to be killed by the kernel oom-killer with a
large streaming write from a single client.  In my case the client
machine was several years newer than the server machine, and the
network was 10 Gb/s Ethernet with 9000B MTU, which configuration
probably contributes to the problem.

With this patch, the same test ran reliably, with the osd RSS staying
below a few hundred MiB.

Signed-off-by: Jim Schutt <[email protected]>
---
 src/msg/SimpleMessenger.cc |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/src/msg/SimpleMessenger.cc b/src/msg/SimpleMessenger.cc
index a7fb18d..11163e3 100644
--- a/src/msg/SimpleMessenger.cc
+++ b/src/msg/SimpleMessenger.cc
@@ -1750,6 +1750,20 @@ Message *SimpleMessenger::Pipe::read_message()
   unsigned data_off = le32_to_cpu(header.data_off);
   if (data_len) {
     int left = data_len;
+    int throttled = 0;
+
+    while (buffer_total_alloc.read() + data_len > (64*1024*1024)) {
+      struct timespec sleepy_time = {0, (1000 * 1000)}; // 1 msec
+      if (!throttled)
+       dout(4) << " pipe reader paused" << "; buffer_total_alloc "
+               << buffer_total_alloc.read() << dendl;
+      ::nanosleep(&sleepy_time, 0);
+      throttled++;
+    }
+    if (throttled)
+      dout(4) << " pipe reader unpaused; buffer_total_alloc "
+              << buffer_total_alloc.read() << dendl;
+      
     if (data_off & ~PAGE_MASK) {
       // head
       int head = MIN(PAGE_SIZE - (data_off & ~PAGE_MASK),
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to