We're finally running 3.4 in production. :)

So far things have gone very well.  We've had a situation with
launching threads and I'd like to float a suggestion to handle
it better.

One server was running at around 81MB after it fully started, with
around 12 nsd processes.  It has now grown to 128MB with 48 nsd
processes.  I have maxthread set to 40, which is probably too high.
The server ran for a long time with around 20 processes, then at 11:00
it decided it needed to launch 25 new threads (see below).  My
assumption is that something unusual happened like a route flap, this
delayed all of the active threads, so the server started a bunch of
new ones up to maxthreads.  To prevent this from happening, we should
clamp maxthreads to 25 or 30 rather than 40.

As a suggestion, I think it would be good if AS launched threads more
gradually, like Apache does (this is from the Apache performance doc):

---
   As of Apache 1.3, the code will relax the one-per-second rule. It will
   spawn one, wait a second, then spawn two, wait a second, then spawn
   four, and it will continue exponentially until it is spawning 32
   children per second. It will stop whenever it satisfies the
   MinSpareServers setting.

   This appears to be responsive enough that it's almost unnecessary to
   twiddle the MinSpareServers, MaxSpareServers and StartServers
   knobs. When more than 4 children are spawned per second, a message
   will be emitted to the ErrorLog. If you see a lot of these errors then
   consider tuning these settings. Use the mod_status output as a guide.

   Related to process creation is process death induced by the
   MaxRequestsPerChild setting. By default this is 0, which means that
   there is no limit to the number of requests handled per child. If your
   configuration currently has this set to some very low number, such as
   30, you may want to bump this up significantly. If you are running
   SunOS or an old version of Solaris, limit this to 10000 or so because
   of memory leaks.

   When keep-alives are in use, children will be kept busy doing nothing
   waiting for more requests on the already open connection. The default
   KeepAliveTimeout of 15 seconds attempts to minimize this effect. The
   tradeoff here is between network bandwidth and server resources. In no
   event should you raise this above about 60 seconds, as most of the
   benefits are lost.
---

With a new MinSpareThreads directive (maybe only set to 1 or 2), an
idle thread timeout, and a good "ramp up" algorithm, tuning maxthreads
and minthreads would nearly become a non-issue.  The only real purpose
would be to set an absolute ceiling on how many threads to run on a
server you knew was going to be overloaded.  It might be good to have
a "ramp up" factor and limit:
  0 = launch threads at will (current setup)
  1 = launch 1 thread per second until MinSpareThreads are idle
  2 = launch 1,2,4,8,16,... threads per second
  3 = 1,3,9,27,... threads per second

Jim

# ps aux|grep nsd
nsadmin  18443  0.0 15.2 128480 118568 ?     S<   06:49   0:01 bin/nsd -i -t nsd
nsadmin  18446  0.0 15.2 128480 118568 ?     S<   06:49   0:00 bin/nsd -i -t nsd
nsadmin  18447  0.0 15.2 128480 118568 ?     S<   06:49   0:00 bin/nsd -i -t nsd
nsadmin  18448  0.0 15.2 128480 118568 ?     S<   06:49   0:11 bin/nsd -i -t nsd
nsadmin  18449  0.0 15.2 128480 118568 ?     S<   06:49   0:02 bin/nsd -i -t nsd
nsadmin  18450  0.0 15.2 128480 118568 ?     S<   06:49   0:00 bin/nsd -i -t nsd
nsadmin  18453  0.6 15.2 128480 118568 ?     S<   06:49   1:47 bin/nsd -i -t nsd
nsadmin  18454  1.3 15.2 128480 118568 ?     S<   06:49   3:29 bin/nsd -i -t nsd
nsadmin  18455  1.3 15.2 128480 118568 ?     S<   06:49   3:30 bin/nsd -i -t nsd
nsadmin  18456  1.3 15.2 128480 118568 ?     S<   06:49   3:38 bin/nsd -i -t nsd
nsadmin  18459  1.3 15.2 128480 118568 ?     S<   06:49   3:39 bin/nsd -i -t nsd
nsadmin  18471  1.2 15.2 128480 118568 ?     S<   06:50   3:22 bin/nsd -i -t nsd
nsadmin  18805  1.3 15.2 128480 118568 ?     S<   07:00   3:22 bin/nsd -i -t nsd
nsadmin  18806  1.2 15.2 128480 118568 ?     S<   07:00   3:06 bin/nsd -i -t nsd
nsadmin  22744  1.1 15.2 128480 118568 ?     S<   08:39   1:46 bin/nsd -i -t nsd
nsadmin  24802  1.0 15.2 128480 118568 ?     S<   09:19   1:13 bin/nsd -i -t nsd
nsadmin  25795  1.0 15.2 128480 118568 ?     S<   09:38   1:02 bin/nsd -i -t nsd
nsadmin  25796  1.1 15.2 128480 118568 ?     S<   09:38   1:05 bin/nsd -i -t nsd
nsadmin  25797  1.1 15.2 128480 118568 ?     S<   09:38   1:04 bin/nsd -i -t nsd
nsadmin  25798  1.0 15.2 128480 118568 ?     S<   09:38   0:58 bin/nsd -i -t nsd
nsadmin  28368  1.0 15.2 128480 118568 ?     S<   10:26   0:28 bin/nsd -i -t nsd
nsadmin  30225  0.2 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30226  0.4 15.2 128480 118568 ?     S<   11:00   0:03 bin/nsd -i -t nsd
nsadmin  30227  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30228  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30229  0.2 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30230  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30231  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30233  0.5 15.2 128480 118568 ?     S<   11:00   0:04 bin/nsd -i -t nsd
nsadmin  30234  0.2 15.2 128480 118568 ?     S<   11:00   0:01 bin/nsd -i -t nsd
nsadmin  30237  0.2 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30238  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30239  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30240  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30241  0.2 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30242  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30243  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30246  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30247  0.7 15.2 128480 118568 ?     S<   11:00   0:06 bin/nsd -i -t nsd
nsadmin  30248  0.2 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30249  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30250  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30251  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30252  0.2 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30253  0.2 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30254  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd
nsadmin  30255  0.3 15.2 128480 118568 ?     S<   11:00   0:02 bin/nsd -i -t nsd

Reply via email to