Federico Sacerdoti wrote:
On Tuesday, September 24, 2002, at 10:49 AM, Steven Wagner wrote:
People have cited disk I/O as a bottleneck. I personally doubt this.
If it were true, you'd be seeing random gaps whenever RRD updates came
thick and fast (i.e. while all threads were updating RRDs at once),
and the failures should be at least a bit more distributed - not just
in one cluster.
So we noticed that disk I/O was a bottleneck on the old version of
gmetad. We had ~400 hosts, each with ~25rrds (incl summaries), 12K each,
being updated once every 15 seconds. That's 10,000 file updates every 15
seconds, and since their aggregate size was 120MB, it exceeded Linux's
filesystem cache.
Erm, actually I meant "I doubt this is the case for you," as the stated
number of hosts/clusters seemed small (and besides it wasn't occurring in
2.4.x, which records more metrics per host).
Bad day for me. :S