Source: diamond
Version: 4.0.515-3
Severity: serious

In my setup, diamond takes a long time to stop after getting a SIGTERM
from systemd, thus slowing down the overall shutdown of the system by
about a minute (and also making it slow to perform restarts). When it
finally does stop, an error message about QueueHandler is emitted. The
issue is reproducible without systemd too -- run diamond in one terminal
with --log-stdout --foreground and then run kill from another terminal
(Ctrl+C i.e. SIGINT won't reproduce it). This happens even with a
relatively minimal config, e.g. without any Collectors defined and only
with a single GraphiteHandler.

What I see here is:
root@d-i-test:/etc/diamond# diamond --log-stdout --foreground
1486775151.66   [MainProcess:9336:INFO] Changed UID: 0 () GID: 0 ().
1486775152.06   [MainProcess:9336:DEBUG]        metric_queue_size: 16384
1486775152.06   [MainProcess:9336:DEBUG]        Loading Handler 
diamond.handler.graphite.GraphiteHandler
1486775152.1    [MainProcess:9336:DEBUG]        GraphiteHandler: Established 
connection to graphite server 10.192.16.33:2003.
1486775152.11   [Handlers:9347:DEBUG]   Starting process Handlers
1486775153.12   [TCPCollector:9350:DEBUG]       Starting
1486775153.12   [TCPCollector:9350:DEBUG]       Interval: 60.0 seconds
1486775153.13   [TCPCollector:9350:DEBUG]       Max collection time: 28 seconds
1486775153.16   [TCPCollector:9350:DEBUG]       Collection took 32 ms
1486775154.14   [NetworkCollector:9353:DEBUG]   Starting
1486775154.14   [NetworkCollector:9353:DEBUG]   Interval: 60.0 seconds
1486775154.14   [NetworkCollector:9353:DEBUG]   Max collection time: 19 seconds
1486775154.16   [NetworkCollector:9353:DEBUG]   Collection took 16 ms
1486775155.15   [DiskUsageCollector:9356:DEBUG] Starting
1486775155.15   [DiskUsageCollector:9356:DEBUG] Interval: 60.0 seconds
1486775155.15   [DiskUsageCollector:9356:DEBUG] Max collection time: 48 seconds
1486775155.15   [DiskUsageCollector:9356:DEBUG] Collection took 1 ms
1486775156.16   [CPUCollector:9359:DEBUG]       Starting
1486775156.16   [CPUCollector:9359:DEBUG]       Interval: 60.0 seconds
1486775156.16   [CPUCollector:9359:DEBUG]       Max collection time: 43 seconds
1486775156.17   [CPUCollector:9359:DEBUG]       Collection took 7 ms
<send kill from another terminal>
1486775186.96   [MainProcess:9336:INFO] Signal Received: 15
1486775186.96   [NetworkCollector:9353:INFO]    Signal Received: 15
1486775186.96   [Handlers:9347:INFO]    Signal Received: 15
1486775186.96   [DiskUsageCollector:9356:INFO]  Signal Received: 15
1486775186.96   [TCPCollector:9350:INFO]        Signal Received: 15
1486775186.96   [CPUCollector:9359:INFO]        Signal Received: 15
1486775186.97   [NetworkCollector:9353:INFO]    Signal Received: 15
1486775186.97   [CPUCollector:9359:INFO]        Signal Received: 15
<long wait>
Exception socket.error: error(111, 'Connection refused') in <bound method 
QueueHandler.__del__ of <diamond.handler.queue.QueueHandler object at 
0x7f8fa53bff10>> ignored

While this long wait is happening, strace shows diamond is busylooping
running this over and over (apparently python multiprocessing related):
connect(4, {sa_family=AF_UNIX, sun_path="/tmp/pymp-xq1DKs/listener-8_h02F"}, 
34) = -1 ECONNREFUSED (Connection refused)
close(4)                                = 0
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000}) = 0 (Timeout)
socket(AF_UNIX, SOCK_STREAM, 0)         = 4
fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR)               = 0

Let me know if I can help in any way.

-- System Information:
Debian Release: 9.0
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable'), (50, 'unstable'), (1, 
'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=el_GR.UTF-8, LC_CTYPE=el_GR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Reply via email to