We're finally running 3.4 in production. :) So far things have gone very well. We've had a situation with launching threads and I'd like to float a suggestion to handle it better.
One server was running at around 81MB after it fully started, with around 12 nsd processes. It has now grown to 128MB with 48 nsd processes. I have maxthread set to 40, which is probably too high. The server ran for a long time with around 20 processes, then at 11:00 it decided it needed to launch 25 new threads (see below). My assumption is that something unusual happened like a route flap, this delayed all of the active threads, so the server started a bunch of new ones up to maxthreads. To prevent this from happening, we should clamp maxthreads to 25 or 30 rather than 40. As a suggestion, I think it would be good if AS launched threads more gradually, like Apache does (this is from the Apache performance doc): --- As of Apache 1.3, the code will relax the one-per-second rule. It will spawn one, wait a second, then spawn two, wait a second, then spawn four, and it will continue exponentially until it is spawning 32 children per second. It will stop whenever it satisfies the MinSpareServers setting. This appears to be responsive enough that it's almost unnecessary to twiddle the MinSpareServers, MaxSpareServers and StartServers knobs. When more than 4 children are spawned per second, a message will be emitted to the ErrorLog. If you see a lot of these errors then consider tuning these settings. Use the mod_status output as a guide. Related to process creation is process death induced by the MaxRequestsPerChild setting. By default this is 0, which means that there is no limit to the number of requests handled per child. If your configuration currently has this set to some very low number, such as 30, you may want to bump this up significantly. If you are running SunOS or an old version of Solaris, limit this to 10000 or so because of memory leaks. When keep-alives are in use, children will be kept busy doing nothing waiting for more requests on the already open connection. The default KeepAliveTimeout of 15 seconds attempts to minimize this effect. The tradeoff here is between network bandwidth and server resources. In no event should you raise this above about 60 seconds, as most of the benefits are lost. --- With a new MinSpareThreads directive (maybe only set to 1 or 2), an idle thread timeout, and a good "ramp up" algorithm, tuning maxthreads and minthreads would nearly become a non-issue. The only real purpose would be to set an absolute ceiling on how many threads to run on a server you knew was going to be overloaded. It might be good to have a "ramp up" factor and limit: 0 = launch threads at will (current setup) 1 = launch 1 thread per second until MinSpareThreads are idle 2 = launch 1,2,4,8,16,... threads per second 3 = 1,3,9,27,... threads per second Jim # ps aux|grep nsd nsadmin 18443 0.0 15.2 128480 118568 ? S< 06:49 0:01 bin/nsd -i -t nsd nsadmin 18446 0.0 15.2 128480 118568 ? S< 06:49 0:00 bin/nsd -i -t nsd nsadmin 18447 0.0 15.2 128480 118568 ? S< 06:49 0:00 bin/nsd -i -t nsd nsadmin 18448 0.0 15.2 128480 118568 ? S< 06:49 0:11 bin/nsd -i -t nsd nsadmin 18449 0.0 15.2 128480 118568 ? S< 06:49 0:02 bin/nsd -i -t nsd nsadmin 18450 0.0 15.2 128480 118568 ? S< 06:49 0:00 bin/nsd -i -t nsd nsadmin 18453 0.6 15.2 128480 118568 ? S< 06:49 1:47 bin/nsd -i -t nsd nsadmin 18454 1.3 15.2 128480 118568 ? S< 06:49 3:29 bin/nsd -i -t nsd nsadmin 18455 1.3 15.2 128480 118568 ? S< 06:49 3:30 bin/nsd -i -t nsd nsadmin 18456 1.3 15.2 128480 118568 ? S< 06:49 3:38 bin/nsd -i -t nsd nsadmin 18459 1.3 15.2 128480 118568 ? S< 06:49 3:39 bin/nsd -i -t nsd nsadmin 18471 1.2 15.2 128480 118568 ? S< 06:50 3:22 bin/nsd -i -t nsd nsadmin 18805 1.3 15.2 128480 118568 ? S< 07:00 3:22 bin/nsd -i -t nsd nsadmin 18806 1.2 15.2 128480 118568 ? S< 07:00 3:06 bin/nsd -i -t nsd nsadmin 22744 1.1 15.2 128480 118568 ? S< 08:39 1:46 bin/nsd -i -t nsd nsadmin 24802 1.0 15.2 128480 118568 ? S< 09:19 1:13 bin/nsd -i -t nsd nsadmin 25795 1.0 15.2 128480 118568 ? S< 09:38 1:02 bin/nsd -i -t nsd nsadmin 25796 1.1 15.2 128480 118568 ? S< 09:38 1:05 bin/nsd -i -t nsd nsadmin 25797 1.1 15.2 128480 118568 ? S< 09:38 1:04 bin/nsd -i -t nsd nsadmin 25798 1.0 15.2 128480 118568 ? S< 09:38 0:58 bin/nsd -i -t nsd nsadmin 28368 1.0 15.2 128480 118568 ? S< 10:26 0:28 bin/nsd -i -t nsd nsadmin 30225 0.2 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30226 0.4 15.2 128480 118568 ? S< 11:00 0:03 bin/nsd -i -t nsd nsadmin 30227 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30228 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30229 0.2 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30230 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30231 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30233 0.5 15.2 128480 118568 ? S< 11:00 0:04 bin/nsd -i -t nsd nsadmin 30234 0.2 15.2 128480 118568 ? S< 11:00 0:01 bin/nsd -i -t nsd nsadmin 30237 0.2 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30238 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30239 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30240 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30241 0.2 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30242 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30243 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30246 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30247 0.7 15.2 128480 118568 ? S< 11:00 0:06 bin/nsd -i -t nsd nsadmin 30248 0.2 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30249 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30250 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30251 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30252 0.2 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30253 0.2 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30254 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd nsadmin 30255 0.3 15.2 128480 118568 ? S< 11:00 0:02 bin/nsd -i -t nsd
