OK. Let's go.
Now i have about 100 users simultaneous. That box has 4 CPU and it's running
JBoss.
My application it's very slow. Looking at prstat I saw that I have many
threads and most of time one or two threads are running in the same time.
Using -Lm prstat options I saw that the most of time the threads are in
LCK/SLP state.
Using plockstat I saw many locks for malloc, so I changed for libumem. I
don't have more contention locks (Spin, block), I have just many Mutex hold
locks. But the problem still exist. I have 7 box running jboss, all box have
the same problem.
Doubt: Are there some difference for output running prstat for 1 second or 5
second? Because running prstat for 1 second I get thread with 100% LCK.
*******************************************************************
Prstat (now), I got a little better LCK when I changed for libumem:
[EMAIL PROTECTED]:/] # prstat -Lmp 25736
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
25736 root 57 0.0 0.0 0.0 0.0 43 0.0 0.1 36 83 76 0 java/2
25736 root 0.6 0.5 0.0 0.0 0.0 16 78 0.0 32 0 1K 0 java/7211
25736 root 0.4 0.6 0.0 0.0 0.0 29 70 0.0 19 5 1K 0 java/7237
25736 root 0.4 0.6 0.0 0.0 0.0 27 61 0.2 25 15 1K 0 java/7364
25736 root 0.4 0.5 0.0 0.0 0.0 20 80 0.0 19 8 1K 0 java/7218
25736 root 0.3 0.4 0.0 0.0 0.0 29 70 0.0 18 5 1K 0 java/7204
25736 root 0.3 0.4 0.0 0.0 0.0 16 73 0.0 23 6 864 0 java/7374
25736 root 0.2 0.4 0.0 0.0 0.0 20 79 0.0 9 11 1K 0 java/7305
25736 root 0.3 0.4 0.0 0.0 0.0 23 76 0.0 7 9 900 0 java/7475
25736 root 0.4 0.3 0.0 0.0 0.0 11 88 0.0 15 1 553 0 java/7236
25736 root 0.3 0.3 0.0 0.0 0.0 0.8 93 0.0 17 2 734 0 java/7370
25736 root 0.1 0.3 0.0 0.0 0.0 8.7 91 0.0 7 2 773 0 java/7235
25736 root 0.2 0.2 0.0 0.0 0.0 9.5 90 0.0 8 0 316 0 java/7219
25736 root 0.2 0.1 0.0 0.0 0.0 9.8 84 0.0 17 11 306 0 java/7269
25736 root 0.1 0.1 0.0 0.0 0.0 0.1 100 0.0 2 1 419 0 java/7471
25736 root 0.2 0.0 0.0 0.0 0.0 8.0 92 0.0 14 3 68 0 java/7264
25736 root 0.1 0.1 0.0 0.0 0.0 10 89 0.0 6 0 327 0 java/7470
25736 root 0.1 0.1 0.0 0.0 0.0 0.0 100 0.0 10 0 302 0 java/7365
25736 root 0.2 0.0 0.0 0.0 0.0 6.4 88 0.1 14 5 49 0 java/7317
25736 root 0.1 0.1 0.0 0.0 0.0 93 7.0 0.1 4 4 209 0 java/7367
25736 root 0.1 0.0 0.0 0.0 0.0 4.0 91 0.0 10 1 55 0 java/7229
25736 root 0.1 0.0 0.0 0.0 0.0 3.9 96 0.0 11 1 57 0 java/7304
25736 root 0.1 0.1 0.0 0.0 0.0 8.6 91 0.0 4 1 196 0 java/14
25736 root 0.0 0.1 0.0 0.0 0.0 0.0 100 0.0 1 1 263 0 java/7297
25736 root 0.1 0.1 0.0 0.0 0.0 0.0 100 0.1 154 0 86 0 java/10
25736 root 0.1 0.0 0.0 0.0 0.0 0.6 99 0.0 9 0 55 0 java/7377
25736 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 12 4 54 0 java/4
25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 16 0 40 0 java/3
25736 root 0.0 0.0 0.0 0.0 0.0 3.6 96 0.0 2 0 4 0 java/7481
25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 8 0 12 0 java/170
25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 12 0 12 0 java/231
25736 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 19 4 25 0 java/171
25736 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 2 0 java/7221
25736 root 0.0 0.0 0.0 0.0 0.0 95 0.0 0.1 20 0 20 0 java/167
25736 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 2 0 java/7378
Total: 1 processes, 256 lwps, load averages: 0.74, 0.68, 0.62
******************************************************************
Plockstat (now):
[EMAIL PROTECTED]:/dtrace] # plockstat -A -e 30 -p 25736
0
Mutex hold
Count nsec Lock Caller
----------------------------------------------------------------------------
---
3546 4765 0x8079580 libumem.so.1`umem_cache_alloc+0xc1
3509 3865 0x8079580 libumem.so.1`umem_cache_free+0x194
328 12542 0x8073030 libumem.so.1`vmem_xfree+0xfe
3 1340378 libumem.so.1`umem_cache_lock
libumem.so.1`umem_cache_applyall+0x51
261 13414 0x8073030 libumem.so.1`vmem_alloc+0x13c
471 5696 0x807b680
libumem.so.1`umem_cache_alloc+0xc1
475 5441 0x807b680
libumem.so.1`umem_cache_free+0x194
123 20767 0x807b6c0
libumem.so.1`umem_cache_free+0x194
28 90103 0x807a1c0
libumem.so.1`umem_cache_free+0x194
329 6788 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_alloc+0x126
309 6900 0x807ad80
libumem.so.1`umem_cache_alloc+0xc1
329 6310 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xalloc+0x43b
328 6248 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_xfree+0x127
355 5727 0x807b300
libumem.so.1`umem_cache_alloc+0xc1
216 9080 0x807b940
libumem.so.1`umem_cache_alloc+0xc1
355 5338 0x807b300
libumem.so.1`umem_cache_free+0x194
328 5393 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xfree+0xfe
310 5308 0x807ad80
libumem.so.1`umem_cache_free+0x194
329 4921 0x8073030 libumem.so.1`vmem_xalloc+0x43b
154 10299 0x807b9c0
libumem.so.1`umem_cache_alloc+0xc1
216 6043 0x807b940
libumem.so.1`umem_cache_free+0x194
329 3960 0x8073030 libumem.so.1`vmem_xalloc+0x315
161 7968 0x807b980
libumem.so.1`umem_cache_alloc+0xc1
235 5438 0x807b500
libumem.so.1`umem_cache_alloc+0xc1
329 3784 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_alloc+0x13c
241 5029 0x807b500
libumem.so.1`umem_cache_free+0x194
85 13906 0x8082c80
libumem.so.1`umem_cache_alloc+0xc1
329 3577 libumem.so.1`vmem0+0x4a4 libumem.so.1`vmem_xalloc+0x315
39 29926 0x8079980
libumem.so.1`umem_cache_free+0x194
130 8876 0x807ba00
libumem.so.1`umem_cache_alloc+0xc1
34 33388 0xc8e149c0 libc.so.1`thr_suspend+0x1a
155 7304 0x807ba40
libumem.so.1`umem_cache_alloc+0xc1
13 86856 0x807ac80
libumem.so.1`umem_cache_alloc+0xc1
94 11979 0x80828c0
libumem.so.1`umem_cache_alloc+0xc1
143 7742 0x807b5c0
libumem.so.1`umem_cache_alloc+0xc1
153 5759 0x807b5c0
libumem.so.1`umem_cache_free+0x194
98 8816 0x807b540
libumem.so.1`umem_cache_alloc+0xc1
115 7499 0x807b8c0
libumem.so.1`umem_cache_alloc+0xc1
26 32518 0xc8e12100 libc.so.1`thr_suspend+0x1a
158 5210 0x807b640
libumem.so.1`umem_cache_free+0x194
95 8504 0x80828c0
libumem.so.1`umem_cache_free+0x194
155 5199 0x807ba40
libumem.so.1`umem_cache_free+0x194
8 97711 0xc8e145c0 libc.so.1`thr_suspend+0x1a
130 5915 0x807ba00
libumem.so.1`umem_cache_free+0x194
41 18736 0x807b2c0
libumem.so.1`umem_cache_alloc+0xc1
101 7524 0x8082940
libumem.so.1`umem_cache_alloc+0xc1
33 22842 0x807d640
libumem.so.1`umem_cache_free+0x194
35 21064 0x807acc0
libumem.so.1`umem_cache_alloc+0xc1
16 45814 0xc8e12900 libc.so.1`thr_suspend+0x1a
129 5641 0x807b880
libumem.so.1`umem_cache_free+0x194
5 142887 0xc8e13100 libc.so.1`thr_suspend+0x1a
43 16584 0x807b240
libumem.so.1`umem_cache_alloc+0xc1
21 33838 0xc8e12b80 libc.so.1`thr_suspend+0x1a
101 7012 0x8082940
libumem.so.1`umem_cache_free+0x194
8 42677 0xc8e13f00 libc.so.1`thr_suspend+0x1a
7 48677 0xc8e12940 libc.so.1`thr_suspend+0x1a
9 37763 0xc8e13080 libc.so.1`thr_suspend+0x1a
30 11317 0x8079880
libumem.so.1`umem_cache_alloc+0xc1
61 5553 0x807fcc0
libumem.so.1`umem_cache_alloc+0xc1
7 38522 0xc8e14240 libc.so.1`thr_suspend+0x1a
47 5664 0x8079cc0
26 9348 0x807a280
libumem.so.1`umem_cache_alloc+0xc1
6 39964 0xc8e14480 libc.so.1`thr_suspend+0x1a
26 9198 0x807bc40
libumem.so.1`umem_cache_alloc+0xc1
5 47600 0xc8e15440 libc.so.1`thr_suspend+0x1a
30 7853 0x807a240
libumem.so.1`umem_cache_alloc+0xc1
14 16771 0x80824b4
libumem.so.1`umem_depot_alloc+0xa4
3 3747 0x807f4b4
libumem.so.1`umem_cache_update+0xe5
3 3743 0x807d134
libumem.so.1`umem_cache_update+0xe5
(...)
Mutex block
Count nsec Lock Caller
----------------------------------------------------------------------------
---
1 1624523 0x8073030 libumem.so.1`vmem_xfree+0x24
1 23803 0x80825c0
libumem.so.1`umem_cache_free+0x51
1 18619 0x8073030 libumem.so.1`vmem_xfree+0x24
1 15304 0x807b5c0
libumem.so.1`umem_cache_free+0x51
******************************************************************
# time dtrace -n 'syscall::write:entry / pid == $target / {
@[fds[arg0].fi_pathname] = count(); }' -p 25736
dtrace: description 'syscall::write:entry ' matched 1 probe
^C
/zonas/srvintra/root/var/tmp/imageio158908.tmp 16
/zonas/srvintra/root/var/tmp/imageio158909.tmp 16
/zonas/srvintra/root/var/tmp/imageio158910.tmp 16
/zonas/srvintra/root/var/tmp/imageio158911.tmp 16
/zonas/srvintra/root/var/tmp/imageio158912.tmp 16
/zonas/srvintra/root/var/tmp/imageio158913.tmp 16
/zonas/srvintra/root/var/tmp/imageio158914.tmp 16
/zonas/srvintra/root/var/tmp/imageio158915.tmp 16
/zonas/srvintra/root/var/tmp/imageio158916.tmp 16
/zonas/srvintra/root/var/tmp/imageio158917.tmp 16
/zonas/srvintra/root/var/tmp/imageio158918.tmp 16
/zonas/srvintra/root/var/tmp/imageio158919.tmp 16
/zonas/srvintra/root/var/tmp/imageio158920.tmp 16
/zonas/srvintra/root/var/tmp/imageio158921.tmp 16
/zonas/srvintra/root/var/tmp/imageio158922.tmp 16
/zonas/srvintra/root/var/tmp/imageio158923.tmp 16
/zonas/srvintra/root/var/tmp/imageio158924.tmp 16
/zonas/srvintra/root/var/tmp/imageio158925.tmp 16
/zonas/srvintra/root/var/tmp/imageio158926.tmp 16
/zonas/srvintra/root/var/tmp/imageio158927.tmp 16
/zonas/srvintra/root/var/tmp/imageio158928.tmp 16
/zonas/srvintra/root/var/tmp/imageio158929.tmp 16
/zonas/srvintra/root/var/tmp/imageio158930.tmp 16
/zonas/srvintra/root/var/tmp/imageio158931.tmp 16
/zonas/srvintra/root/var/tmp/imageio158932.tmp 16
/zonas/srvintra/root/var/tmp/imageio158933.tmp 16
/zonas/srvintra/root/var/tmp/imageio158934.tmp 16
/zonas/srvintra/root/var/tmp/imageio158935.tmp 16
/zonas/srvintra/root/var/tmp/imageio158936.tmp 16
/zonas/srvintra/root/var/tmp/imageio158937.tmp 16
/zonas/srvintra/root/var/tmp/imageio158938.tmp 16
/zonas/srvintra/root/var/tmp/imageio158939.tmp 16
/zonas/srvintra/root/var/tmp/imageio158940.tmp 16
/zonas/srvintra/root/var/tmp/imageio158941.tmp 16
/zonas/srvintra/root/var/tmp/imageio158942.tmp 16
/zonas/srvintra/root/var/tmp/imageio158943.tmp 16
/zonas/srvintra/root/var/tmp/imageio158944.tmp 16
/zonas/srvintra/root/var/tmp/imageio158945.tmp 16
/zonas/srvintra/root/var/tmp/imageio158946.tmp 16
/zonas/srvintra/root/var/tmp/imageio158947.tmp 16
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04
_23.controller.csv 247
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/deploy/detr
an.war/WEB-INF/control-detran.log 784
/zonas/srvintra/root/opt/jboss-3.2.3_jetty-4.2.17/server/default/log/2008_04
_23.request.log 89872
real 0m46.632s
user 0m0.535s
sys 0m0.412s
*************************************************************
# time dtrace -n 'pid$target:libc:malloc:entry { @j[jstack()] = count(); }'
-p 5736
dtrace: description 'pid$target:libc:malloc:entry ' matched 1 probe
^C
real 0m52.971s
user 0m0.581s
sys 0m0.452s
Regards,
------------------------------------------------------------------
Kleyson Rios.
Gerência de Suporte Técnico
Analista de Suporte / Líder de Equipe
Governo do Estado de Goiás
Agência de Administração
Diretoria de Informática
+55 62 3201-6582
"Você sabe a diferença entre pessoas bem sucedidas e pessoas mal sucedidas ?
Pessoas bem sucedidas estão dispostas a fazer as coisas que as pessoas mal
sucedidas não estão."
-----Mensagem original-----
De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Enviada em: quarta-feira, 23 de abril de 2008 13:08
Para: Kleyson Rios
Cc: [email protected]
Assunto: Re: RES: Process in LCK / SLP (Please)
D'oh! My bad. My brain was thinking malloc and lock contention,
I saw the network calls on the stack, connected the dots and starting
typing. I should have looked at the top of the stack. Sorry.
Let's back up a minute. You have a Java application and the JVM has a
large number of threads in LCK time (user lock), right?
plockstat showed a lot of lock activity in libc malloc, at which point
you tried mtmalloc and libumem, neither of which made much of
a difference, right?
We know empirically (and heard from Jarod) that LCK can be deceiving,
because threads that call cond_wait(3C) to wait on a condition variable
will be charged with LCK time, but it's really sleep time most of the time.
So it is sometimes an artifact of the application design that threads show
up at 100% LCK time - threads are waiting for a cond_signal() (or
cond_broadcast())
to be put to work, but there's no work to do. Anyway...
The problem I thought we were chasing was lock activity in malloc.
If you have more recent data (plockstat, prstat) please pass it to me (sorry
I missed it), and I'll get back in sync on the current problem.
I would add that there's nothing here that I think belongs on
dtrace_discuss.
I would start posting to the performance group and a Java performance
alias. Also, and I'm sorry if I missed this, how well (or poorly) is the
Java application performing, and what metric do we have to determine
application performance.
Ricky's script grabs a user stack when a thread goes off CPU to sleep, and
tally's what the threads are sleeping on. It's mostly condition
variables, which
doesn't really help us much (everything sleeps on CVs...mostly). There's a
lof of user locks in there as well. So...
The CVs look like threads blocking on writes, including writes to a
network socket.
You need to figure out the write target, and go from there.
Get the PID of the target JVM -
dtrace -n 'syscall::write:entry / pid == $target /
@[fds[arg0].fi_pathname] = count(); }' -p [PID_OF_JVM]
The above will list the files being written to. The next step depends on
what we see.
The same goes for the user locks. If they are indeed malloc locks (which
I think they are),
I suspect that will come back around to network IOs larger than 2k. Try
this:
dtrace -n 'pid$target:libc:malloc:entry { @j[jstack()] = count(); }' -p
[PID_OF_JVM]
Thanks,
/jim
Kleyson Rios wrote:
> Hi Jim,
>
> But, if there are problems in malloc for buffers for network, I should see
> the locks when running plockstat, don't see ?
>
>
> Regards,
>
> ------------------------------------------------------------------
>
> Kleyson Rios.
>
> -----Mensagem original-----
> De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Enviada em: quarta-feira, 23 de abril de 2008 01:11
> Para: Kleyson Rios
> Cc: [email protected]
> Assunto: Re: [dtrace-discuss] RES: RES: Process in LCK / SLP (Please)
>
> You may want to cross-post to a Java alias, but I've been down this
> road before.
>
> Java will call into malloc() for buffers for network reads and writes
> that are larger than 2k bytes (the 2k is from memory, and I think it
> was a 1.5 JVM). A large number of malloc calls,
> and resulting contention on locks in the library, are due to the
application
> doing network writes of larger than 2k.
>
> Newer JVMs (1.6) may improve this, but I'm not sure. There's also
> an alternative set of classes and methods, NIO, which also can
> help (although I've heard tell that NIO brings other problems along
> with it, but I can not speak from personal experience).
>
> At this point, I think you need to consult with Java experts to determine
> what options you have for buffer allocation for network IO from the
> Java heap, versus the current behavior of the JVM dropping back
> to malloc for allocating buffers for network IO.
>
> The other option of course is determining if the code can be changed to
> use buffers smaller than 2k.
>
> Thanks,
> /jim
>
>
>
>
> Kleyson Rios wrote:
>
>> OK jonathan,
>>
>> I understand.
>>
>> So, looking on right place now, i can see few locks and sometimes no
locks
>> (just Mutex Hold). But I still have many threads in 100% LCK.
>>
>> If I don't have a lot of locks, where is my problem ?
>>
>> Running rickey c weisner's script a get:
>>
>> (...)
>> 25736
>> libc.so.1`_so_send+0x15
>> libjvm.so`__1cDhpiEsend6Fipcii_i_+0x67
>> libjvm.so`JVM_Send+0x32
>>
>>
> libnet.so`Java_java_net_SocketOutputStream_socketWrite0+0x131
>
>> 0xc3c098d3
>> 10
>> 25736
>> 0xc3d2a33a
>> 14
>> 25736
>> libc.so.1`_write+0x15
>> libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>> libjvm.so`JVM_Write+0x30
>> libjava.so`0xc8f7c04b
>> 16
>> 25736
>> libc.so.1`stat64+0x15
>> 21
>> 25736
>> libc.so.1`_write+0x15
>> libjvm.so`__1cDhpiFwrite6FipkvI_I_+0x5d
>> libjvm.so`JVM_Write+0x30
>> libjava.so`0xc8f80ce9
>> 76
>> java 25736 kernel-level lock 1
>> java 25736 shuttle 6
>> java 25736 preempted 7
>> java 25736 user-level lock 511
>> java 25736 condition variable 748
>>
>>
>> Atenciosamente,
>>
>> ------------------------------------------------------------------
>>
>> Kleyson Rios.
>> Gerência de Suporte Técnico
>> Analista de Suporte / Líder de Equipe
>>
>>
>> -----Mensagem original-----
>> De: Jonathan Adams [mailto:[EMAIL PROTECTED]
>> Enviada em: sexta-feira, 18 de abril de 2008 15:40
>> Para: Kleyson Rios
>> Cc: [email protected]
>> Assunto: Re: [dtrace-discuss] RES: Process in LCK / SLP (Please)
>>
>>
>> On Apr 18, 2008, at 1:03 PM, Kleyson Rios wrote:
>>
>>
>>> Hi przemol,
>>>
>>> Bellow output of plockstat for malloc and libumem. Both many locks.
>>> Why changing to libumem I didn't get less locks ?
>>>
>>>
>>>
>> You're looking at Mutex hold statistics, which don't mean a lot
>> (unless contention is caused by long hold times)
>>
>> The important thing for multi-threaded performance is *contention*.
>> (Spinning and blocking) Those are the statistics you should be
>> looking at.
>>
>> Both malloc and libumem use locks to protect their state; libumem
>> just uses many locks, in order to reduce contention.
>>
>> Cheers,
>> - jonathan
>>
>>
>>
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> [email protected]
>>
>>
>
>
>
_______________________________________________
dtrace-discuss mailing list
[email protected]