Hi Florian On 16/01/2013, at 11:53 PM, Florian Forster <[email protected]> wrote:
> the "network" plugin is using one thread to dispatch values to the > daemon. If that thread is getting stuck somewhere, received values will > accumulate in the "to be dispatched" queue. Since the resident segment size > (RSS, memory consumption) of collectd is growing rapidly in this period, > this is likely happening. OK > > If the thread is not truly stuck, just delayed a bit (say 100 ms), then > only 10 values received from the network can be dispatches per second. > This would seem like "nothing is happening" for sufficiently many files. > The 5 read threads (the default) can still handle 500 files during the > normal read interval (10 seconds), seeming like "everything is fine". In > short, yes, it is possible that this is related to #75. OK, that makes sense. > >> Yes :-) The device is dm-0. Most of the time it sits around 1,600 >> write ops per second. When the problem occurred it dropped down to >> around 15 write ops per second. Disk write time decreased from around >> 1.4 to around 0.2 while the problem was occurring, reflecting a lower >> load on the disks i presume. After restarting collectd both these >> figures went back to normal after a few minutes. > > 1600 I/O-ops/s is impressive :) This is another thing that we see happening sometimes: http://f.cl.ly/items/0N151O3F0k3y3s101n0U/Screen%20Shot%202013-01-17%20at%2011.28.40%20AM.png This is peaking at 5,200 writes per second after 'the event'. The event here was a new server being added causing collectd to create 182 new RRD files, claiming 3.2GB of disk space. This seems to have perhaps triggered issue #75 where the writes are held back in the network plugin for a time, and then the floodgates are opened and it goes all peaky for a time. Do you think? > > How does I/O _bytes_ behave during these times? 15 writes per second > with 70 MByte each sums up to roughtly 1 GByte/s being written … ;) The IO bytes remains proportionate to the IO ops. Usually around 6 MB/s, dropped to around 60 kB/s, then back to 6 MB/s. So yeah, no real change discernible in the average transaction size. > > If this happens again, can you record collectd's I/O, especially which > files it opens? Something along these lines should do the trick: > > # strace -ttt -e trace=open -o collectd.strace -p $COLLECTD_PID -s 2048 Will do! Thanks. Cheers Jesse _______________________________________________ collectd mailing list [email protected] http://mailman.verplant.org/listinfo/collectd
