2003-09-05T04:41:22 James Stevens: > Forking for each incomming connection could work out expensive, > [...]
[...] but not badly so at all on Linux. Other OSes aren't so swift; I can damn near fork (and context switch) faster than Solaris:-). > A more sophisticated variation would be to load the database, open the > main socket, then fork (say) 5 child process all running a blocking > "accept", one of the child processes (whoever happens to have CPU at the > time) will then be given the incomming connection and scan the data. I've done this, and it works out _magnificently_ for email content scanning. An email content scanner is sandwiched between two bits of MTA, so the MTA gets to have the Big Picture control over concurrency management. In my code, the master binds, then forks off N children (with quick naps between, to keep from killing the system), then goes to sleep, waiting for a child to exit. On healthy OSes the children just jump right into accept on the sockets, letting the OS dispatch connections to children as it wishes. On sick, sick platforms this produces errors, so the children dispatch off a semaphore so that only one child is attempting to accept at a time. > The child could then either die (and be re-started by the master) or go > into another blocking "accept". You could then allow the child to scan, > say 10 jobs, before it dies (and is re-started by the master). This is > basically how Apache works. I had a configurable minjobs and maxjobs (defaults 100 and 200, sounds like clamd might want to start a little lower:-), and each child rolled a random number uniformly between those two and serviced just that many before exiting; this schmeared the child exits out over plenty of time so the master didn't suddenly find all its children gone and service stalled until it could re-fork 'em. > Apache is slight more sophisticated still, [...] Indeed, but it's solving a harder problem, adapting gracefully to the fractally chaotic load a public webserver gets. An email content analyzer can be presented a far, far better conditioned load by its surrounding MTA. > The master should also have a SIGALRM back stop, so that if it > locks up, it dies. The master would then be run through inittab so > that it is always immediately re-started. That far I don't go; if a simple networking parent can't remain stable and alive, I'll hunt it down and fix it. Or delete it. This reminds me of djb's daemontools, where absolutely rock-solid daemons like dnscache and tinydns are run under a respawner that's run under an init-replacement respawner that's run under init to make sure it's respawned as necessary.... Thanks anyway, I run my djbdns components out of init scripts:-). > This would give a really bullet proof scanning service and allow > for a reasonable level of leaking / bugs in the scanning process > itself. Arranging to have the mime-hacker process a bounded number of jobs before exiting, and having crashes in it not deny the whole service, is definitely appropriate; MIME parsing is an impossible job to do completely correctly, and is a fiendishly difficult job to do even usefully competantly. MIME is blecherous. I'm less excited by massive efforts to carefully arrange for the networking parent to be supervised and monitored and restarted if necessary, and for the superviser that monitors that process to be so monitored, etc. If the parent process that bound the socket and forks the children should die, my MTA monitoring will set off alarms (that's only one of a class of possible environmental problems that could give it constipation), and I'll figure out what happened and fix it. Oh, and about my code, if anybody wants it for anything you're welcome to it, <URL:http://bent.latency.net/smtpprox/>, but as it's an SMTP proxy written in perl, it probably isn't directly useful to clamd developers, the above description likely has all the goodie you'd be able to get out of the perl. -Bennett
pgp00000.pgp
Description: PGP signature
