On Tuesday, 8 בNovember 2005 13:06, Oded Arbel wrote:
> On Sunday, 6 ׳‘November 2005 22:13, Yedidyah Bar-David wrote:
> > Maybe one of the scripts/daemons has a loop of quite short delays?
> > Testing this isn't very easy - you can either strace some of the
> > suspects or try something like syscalltrack.
> Then I started removing processes until I got to the culprit - the
> Java program that implements the services provided by the server. I
> of course did the testing in the off-peak hours so there will be no
> disturbance of service to our client. At that time there was
> absolutely no activity whatsoever on any of the services, so the only
> thing the Java program was supposed to do was call wait() (a Java
> thread synchronization call) every second, which was indeed verified
> by stracing the Java process, and here is the output:
>
> futex(0x4d907b60, FUTEX_WAIT, 233, {0, 265545000}) = -1 ETIMEDOUT
> (Connection timed out)
> futex(0x805d33c, FUTEX_WAKE, 1) = 0
> gettimeofday({1131445296, 417683}, NULL) = 0
> clock_gettime(0, {1131445296, 417799000}) = 0
I found the problem - the Java process which was supposed to be only in
wait() state (which I assume was what all the futex calls where about),
had a thread which was busy looping. It was actually going very quickly
back and forth through a memory barrier (synchronization), which might
have explained the futex had I not expected this to happen quite more
often then about once a second.
I fixed the code and now the machine is down to a more reasonable usage
- 0.30 under normal load conditions.
I still don't understand why the Java process wasn't showing on the
ps/top list - it didn't even have a lot of 'total cpu time' allocated
to it.
--
Oded
::..
Proofread carefully to see if you any words out.
================================================================To unsubscribe,
send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]