There are 9 physical systems involved planet-wide. The number of CPU's per system varies from 2 to 10 depending on which system we're talking about. Some of the systems are old 2084's. Some are 2094's. Each system has up to 4 inbound SMTP's which feed up to 4 intermediary workers. The intermediary creates a single copy of each mail file for each RCPT TO and sends it on to another worker for delivery (either back out via SMTP or converted to mainframe mail format and delivered over RSCS). Some of the RCPT TO addresses can be distribution lists, which are expanded by the intermediary in the copy-making process. The distribution lists can be large (and worse, sometimes we get floods of distlist-bound mail which have more than once resulted in the 9999 spool file limit being exceeded on the workers that handle the ultimate delivery). The last time that happened we had 4 delivery workers on most of the larger systems. We've since doubled that on each system. Combined, these systems are now handling between 30 and 40 million emails a month. Last year at this time that was more like 15 million, so we're experiencing rapid growth. The per-minute arrival rate is typically between 750 and 1500 worldwide, with much (much) higher rates during distlist-bound-mail floods. I am at present working to devise a strategy for dealing with the floods by slowing down the distlist expansion when the number of spool files in the delivery worker's spool is getting dangerously high. A part of that is intelligently distributing the load amongst the workers so as to minimize the chances of any one of them hitting the 9999 limit. Hence my question ... -- bc
On Wed, Jan 5, 2011 at 1:40 PM, Paul Gilmartin <paulgboul...@aim.com> wrote: > On Jan 5, 2011, at 10:47, Bob Cronin wrote: > > > The arrival rate is always high. > > > "High" isn't very quantitative. A high rate to a human being might > be a low rate to a computer. > > Are all the workers in the same LPAR? How many CPUs does that > LPAR have? At what number of workers (per LPAR) do you reach > a point of diminishing returns, where paging overhead outweighs > the value of concurrent processing? If all the workers are > busy 100% of the time, the arrival rate is greater than the > service rate and the queue(s) will grow without bounds. Many > such questions should be considered ahead of whatever esthetic > value lies in randomly distributing the workload. > > Of course, if each of your workers competes 1-for-1 with workloads > of other departments, you can get a bigger share by assigning > more workers. And they can retaliate by assigning more servers > to their workloads. This is known as "The Tragedy of the Commons". > > > On Wed, Jan 5, 2011 at 12:07 PM, Mark Wheeler <mwheele...@hotmail.com > >wrote: > >> > >> If you have "enough" workers defined, then much of the time there will > be > >> multiple workers with NO spool files. By randomly distributing the load > (or > >> round-robining), you keep all the workers "active" from a VM > perspective. If > >> the arrival rate is high enough, all the workers workingsets would stay > in > >> storage (which could be substantial because you indicated this is a very > >> large application). If the arrival rate is low, the workers could > experience > >> a lot of thrashing as they continually get paged out and then back in > when > >> new work arrives. Better IMO to use some other algorithm (alphabetical > >> sort?) to let as many workers as possible stay idle (and eventually > paged > >> out). > > -- gil >