Eric, would it possible for you to publish your test setup so others can try it on different hardware and/or tweak the config a bit?
Thanks. -Moazam On Fri, Mar 6, 2009 at 8:22 AM, Eric Lambert <[email protected]> wrote: > > Hey Adam: > > I've done some simple profiling of the spy client and noticed a drop off in > performance when the number of threads exceeds about 40 (this was on a Sun > Fire x2200 m2, dual core) which sounds similar to the issue you are seeing. > You can see the details of the benchmark on my blog > http://blogs.sun.com/elambert. I've had it as a background task to root > cause the issue and now that I see that it is happening in the real world > and not just my benchmark I'll give this some attention in the very near > future. > > I do have a couple comments, although I don't know how useful they'll be > > 1) In my benchmark I noticed that throughput plateau's at about 20 threads > and that adding more threads (threads 21 - 39) after this point does not > appear to significantly change throughput. Do you know if your clients have > hit such a plateau? In which case, maybe dialing down the concurrency is the > right call since adding more threads is not getting you anything. > > 2) I also noticed that when I run the benchmark on older non multi-core > hardware (say Sunfire v20z with two cpus), I dont see the performance drop > (which i am sure is a big clue as to cause). > > I'll try and spend some time looking into this in the next day or two and > let you know what i find. > > Eric > > > Adam Lee wrote: >> >> we recently made the switch from the whalin client to spy and seem to >> be running into problems under heavy concurrency/load in our front-end >> servers and i was wondering if anybody (dustin, perhaps?) had any >> ideas for strategies to deal with it. >> >> the majority of our front-end servers are sun fire t1000s (8 cores, 4 >> threads per core) running solaris 10, so obviously the spy client >> works a lot better for us in the vast majority of cases-- the >> synchronized blocks in the whalin connection pool gave us a lot of >> contention problems in particular. when the systems get busy, though, >> it seems that i/o can't keep up and we start seeing a lot of timeouts, >> which in turn has a domino effect and effectively brings down the >> entire cluster. the problem is that the machines aren't even reaching >> 60% cpu when this happens. >> >> does my diagnosis of the problem seem right and, if so, any ideas for >> the best way to deal with this? obviously adjusting timeouts would >> probably only exacerbate the problem, so i toyed with the idea of >> having a pool of clients (though i haven't really delved into the code >> to see if that's feasible or would help at all) or possibly hacking it >> to change how its i/o threads work. for now, we've just added a few >> more machines to this cluster, but it seems like a waste of hardware >> when i know that these things can operate above 90% cpu for a >> sustained period with no problem. >> >> thanks... any help would be great and let me know if you have any more >> questions about specifics. >> >> -- >> awl >> > >
