Re: Reducing I/O load of logging

Michael Fortson Thu, 12 Feb 2009 17:19:36 -0800

making the iostat hopefully more legible:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda             580.32   116.43 2094.73   11.76 359234.08  1049.90   171.04
    6.56    3.12   0.48 101.44
dm-0              0.00     0.00 2675.25  127.99 359234.08  1023.94   128.51
   35.64   12.82   0.36 101.44
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00
  0.00    0.00   0.00   0.00


Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda             242.11   407.49  776.72  245.34 148764.37  5222.67   150.66
    2.88    2.85   0.48  48.83
dm-0              0.00     0.00 1018.22  652.63 148660.73  5221.05    92.10
   10.99    6.59   0.29  48.83
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00
  0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               3.46   211.79    7.32   83.94   102.44  2365.85    27.05
  0.05    0.51   0.37   3.39
dm-0              0.00     0.00   10.77  295.73   102.44  2365.85     8.05
  0.12    0.40   0.11   3.37
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00
  0.00    0.00   0.00   0.00

____________________________________________________________
_____________________________________________

On Thu, Feb 12, 2009 at 5:14 PM, Michael Fortson <[email protected]> wrote:
> I'm confused :)
>
> I stopped logging so much in haproxy, but I get the same thing if I
> grep the nginx logs on this server: haproxy's mongrel backend checks
> start failing. I've noticed it only happens when using httpchk (or at
> least it happens much, much more quickly).
>
> Here's an iostat I ran -- the first two are during the grep on the
> nginx logs; the last one is after I finished:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda             580.32   116.43 2094.73   11.76 359234.08  1049.90
> 171.04     6.56    3.12   0.48 101.44
> dm-0              0.00     0.00 2675.25  127.99 359234.08  1023.94
> 128.51    35.64   12.82   0.36 101.44
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda             242.11   407.49  776.72  245.34 148764.37  5222.67
> 150.66     2.88    2.85   0.48  48.83
> dm-0              0.00     0.00 1018.22  652.63 148660.73  5221.05
> 92.10    10.99    6.59   0.29  48.83
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda               3.46   211.79    7.32   83.94   102.44  2365.85
> 27.05     0.05    0.51   0.37   3.39
> dm-0              0.00     0.00   10.77  295.73   102.44  2365.85
> 8.05     0.12    0.40   0.11   3.37
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
>
>
> Top showed cpu use around 16% for grep. There was still ram available.
> It's an 8 core machine.
>
>
> Here's a bit out of the haproxy config:
>
> global
>        log 127.0.0.1 local0 notice #notice
>        #log 127.0.0.1 local1 alert
>
>        daemon
>        nbproc 1
>        maxconn 32000
>        ulimit-n 65536
>        # and uid, gid, daemon, maxconn...
>
> defaults
>        # setup logging
>        log global
>        option tcplog
>        option httplog
>        option dontlognull
>
>        # setup statistics pages
>        stats   enable
>        stats uri  /haproxy_stats
>        stats refresh 10
>        stats auth    xxxxxxxxxxxxxx
>
>
>        mode            http
>        retries         20
>        option          abortonclose
>        option          redispatch
>
>
>        timeout connect 10  # time waiting for backend server to accept
>        timeout client  80s  # data from client (browser)
>        timeout server  120s  # data from server (mongrel)
>
> frontend rails *:7000
>        # set connection limits
>        maxconn         700
>        default_backend mongrels
>
>
> backend mongrels
>        fullconn        80
>
>        option httpchk HEAD /videos/heartbeat
>        option abortonclose
>        balance leastconn
>
>        reqrep ^([^\ ]*)\ //videos(.*)     \1\ /videos\2
>
>        server webapp01-101 10.2.0.3:8101 minconn 1 maxconn 5 check
> inter 1s fastinter 200 rise 1 fall 1
>        server webapp01-102 10.2.0.3:8102 minconn 1 maxconn 5 check
> inter 1s fastinter 200 rise 1 fall 1
> ( etc, for 80 instances over 5 servers)
>
>
> I'm not really sure what to try next.
>
>
>
>
>
> On Wed, Feb 11, 2009 at 7:36 PM, John Lauro <[email protected]>
wrote:
>> If you prefix the filename with a -, that tells syslog not to flush the
data
>> to disk immediately (which is only good in case of a kernel crash, as you
>> might not miss a log entry leading to it, ha!), and it is inefficient
>> forcing a sync every log entry when you have a log entry more at least
every
>> few seconds, and much more so when you have hundreds or thousands a
second.
>>
>> Note, this isn't exactly standard in all versions of syslog.  Do a "man
>> syslog.conf" to make sure it applies to your OS, but generally is
standard
>> with Linux, and other operating systems have others ways to allow for
normal
>> buffering.
>>
>> So, instead of a line like:
>> *.debug;mail.none;authpriv.none;cron.none               /var/log/messages
>> Use the following:
>> *.debug;mail.none;authpriv.none;cron.none
-/var/log/messages
>>
>>
>> How are you doing your log rotates?  That seems strange that it would
cause
>> problems, as that should be a relatively light process, unless you are
doing
>> something like compressing the logs when you rotate.  It should be just
be a
>> mv and then send a -HUP to syslogd.  Of course, if a simple grep is
causing
>> issues, then it sounds like you definitely have some sort of resource
>> contention problem.
>>
>>
>>
>>> -----Original Message-----
>>> From: Michael Fortson [mailto:[email protected]]
>>> Sent: Wednesday, February 11, 2009 8:38 PM
>>> To: John Lauro
>>> Cc: James Brady; [email protected]
>>> Subject: Re: Reducing I/O load of logging
>>>
>>> Hi John,
>>>
>>> Can you explain what you mean by making sure there's a dash in the
>>> name? Is this for something like syslog via /var/log/messages?
>>>
>>> We've had some major issues when logrotate kicks in, knocking haproxy
>>> offline (it reports most of the back-end servers as down and starts
>>> throwing 503s). It's a busy site with 80 servers configured in the
>>> backend. Reducing logrotate to 3 hour intervals shortened the
>>> downtimes but it still happened. It does the same thing if I try to
>>> grep the log.
>>>
>>> I've had to turn off logrotate for now...
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 11, 2009 at 10:01 AM, John Lauro
>>> <[email protected]> wrote:
>>> > It shouldn't be too hard for a machine to keep up with logging.  How
>>> are you
>>> > logging?  standard syslog?   Make sure you have - in front of the
>>> filename?
>>> > How connections per second are you logging?
>>> >
>>> >
>>> >
>>> > Haven't done it with Haproxy, but have with other things that
>>> generate tons
>>> > of logs.
>>> >
>>> >
>>> >
>>> > what you could do is dump the logs (don't forget the - as part of the
>>> file
>>> > name) to /dev/shm/ (assuming linux), and then rotate the logs once a
>>> minute
>>> > and process them.  That way, you will not have any disk I/O from the
>>> logs,
>>> > but would increase memory requirements.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > From: James Brady [mailto:[email protected]]
>>> > Sent: Wednesday, February 11, 2009 12:14 PM
>>> > To: [email protected]
>>> > Subject: Reducing I/O load of logging
>>> >
>>> >
>>> >
>>> > The machine we run HAProxy on is approaching its limits in terms of
>>> disk I/O
>>> > due to our debug-level logging.
>>> >
>>> >
>>> >
>>> > The CPU is barely used, memory is no problem at all - the bottleneck
>>> will
>>> > soon be the rate at which we can log to disk. Because the machine is
>>> more
>>> > than adequate in all other areas, we'd really like to reduce the I/O
>>> load of
>>> > logging.
>>> >
>>> >
>>> >
>>> > We are using the debug log output to tell us the response time and
>>> status
>>> > code for various URLs - this is an essential task we can't do
>>> without.
>>> >
>>> >
>>> >
>>> > Is there any way to get this information without logging and
>>> > post-processing every request? Can HAProxy dump out averaged
>>> statistics like
>>> > this? Can we somehow reduce the I/O load by just logging the exact
>>> fields
>>> > we're interested in?
>>> >
>>> >
>>> >
>>> > Many thanks!
>>> >
>>> > James
>>
>>
>>
>

Re: Reducing I/O load of logging

Reply via email to