Sure, I'll put it on my github account in the next couple days
Moazam Raja wrote:
Eric, would it possible for you to publish your test setup so others
can try it on different hardware and/or tweak the config a bit?
Thanks.
-Moazam
On Fri, Mar 6, 2009 at 8:22 AM, Eric Lambert <[email protected]> wrote:
Hey Adam:
I've done some simple profiling of the spy client and noticed a drop off in
performance when the number of threads exceeds about 40 (this was on a Sun
Fire x2200 m2, dual core) which sounds similar to the issue you are seeing.
You can see the details of the benchmark on my blog
http://blogs.sun.com/elambert. I've had it as a background task to root
cause the issue and now that I see that it is happening in the real world
and not just my benchmark I'll give this some attention in the very near
future.
I do have a couple comments, although I don't know how useful they'll be
1) In my benchmark I noticed that throughput plateau's at about 20 threads
and that adding more threads (threads 21 - 39) after this point does not
appear to significantly change throughput. Do you know if your clients have
hit such a plateau? In which case, maybe dialing down the concurrency is the
right call since adding more threads is not getting you anything.
2) I also noticed that when I run the benchmark on older non multi-core
hardware (say Sunfire v20z with two cpus), I dont see the performance drop
(which i am sure is a big clue as to cause).
I'll try and spend some time looking into this in the next day or two and
let you know what i find.
Eric
Adam Lee wrote:
we recently made the switch from the whalin client to spy and seem to
be running into problems under heavy concurrency/load in our front-end
servers and i was wondering if anybody (dustin, perhaps?) had any
ideas for strategies to deal with it.
the majority of our front-end servers are sun fire t1000s (8 cores, 4
threads per core) running solaris 10, so obviously the spy client
works a lot better for us in the vast majority of cases-- the
synchronized blocks in the whalin connection pool gave us a lot of
contention problems in particular. when the systems get busy, though,
it seems that i/o can't keep up and we start seeing a lot of timeouts,
which in turn has a domino effect and effectively brings down the
entire cluster. the problem is that the machines aren't even reaching
60% cpu when this happens.
does my diagnosis of the problem seem right and, if so, any ideas for
the best way to deal with this? obviously adjusting timeouts would
probably only exacerbate the problem, so i toyed with the idea of
having a pool of clients (though i haven't really delved into the code
to see if that's feasible or would help at all) or possibly hacking it
to change how its i/o threads work. for now, we've just added a few
more machines to this cluster, but it seems like a waste of hardware
when i know that these things can operate above 90% cpu for a
sustained period with no problem.
thanks... any help would be great and let me know if you have any more
questions about specifics.
--
awl