Re: [ceph-users] very different performance on two volumes in the same pool

Nikola Ciprich Mon, 27 Apr 2015 04:18:07 -0700

Hello Somnath,
> Thanks for the perf data..It seems innocuous..I am not seeing single tcmalloc 
> trace, are you running with tcmalloc by the way ?


according to ldd, it seems I have it compiled in, yes:
[root@vfnphav1a ~]# ldd /usr/bin/ceph-osd
.
.
libtcmalloc.so.4 => /usr/lib64/libtcmalloc.so.4 (0x00007f7a3756e000)
.
.


> What about my other question, is the performance of slow volume increasing if 
> you stop IO on the other volume ?
I don't have any other cpeh users, actually whole cluster is idle..

> Are you using default ceph.conf ? Probably, you want to try with different 
> osd_op_num_shards (may be = 10 , based on your osd server config) and 
> osd_op_num_threads_per_shard (may be = 1). Also, you may want to see the 
> effect by doing osd_enable_op_tracker = false

I guess I'm using pretty default settings, few changes probably not much 
related:

[osd]
osd crush update on start = false

[client]
rbd cache = true
rbd cache writethrough until flush = true

[mon]
debug paxos = 0



I now tried setting
throttler perf counter = false
osd enable op tracker = false
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 10

and restarting all ceph servers.. but it seems to make no big difference..


> 
> Are you seeing similar resource consumption on both the servers while IO is 
> going on ?
yes, on all three nodes, ceph-osd seems to be consuming lots of CPU during 
benchmark.

> 
> Need some information about your client, are the volumes exposed with krbd or 
> running with librbd environment ? If krbd and with same physical box, hope 
> you mapped the images with 'noshare' enabled.

I'm using fio with ceph engine, so I guess none rbd related stuff is in use 
here?


> 
> Too many questions :-)  But, this may give some indication what is going on 
> there.
:-) hopefully my answers are not too confused, I'm still pretty new to ceph..

BR

nik


> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Nikola Ciprich [mailto:[email protected]] 
> Sent: Sunday, April 26, 2015 7:32 AM
> To: Somnath Roy
> Cc: [email protected]; [email protected]
> Subject: Re: [ceph-users] very different performance on two volumes in the 
> same pool
> 
> Hello Somnath,
> 
> On Fri, Apr 24, 2015 at 04:23:19PM +0000, Somnath Roy wrote:
> > This could be again because of tcmalloc issue I reported earlier.
> > 
> > Two things to observe.
> > 
> > 1. Is the performance improving if you stop IO on other volume ? If so, it 
> > could be different issue.
> there is no other IO.. only cephfs mounted, but no users of it.
> 
> > 
> > 2. Run perf top in the OSD node and see if tcmalloc traces are popping up.
> 
> don't see anything special:
> 
>   3.34%  libc-2.12.so                  [.] _int_malloc
>   2.87%  libc-2.12.so                  [.] _int_free
>   2.79%  [vdso]                        [.] __vdso_gettimeofday
>   2.67%  libsoftokn3.so                [.] 0x000000000001fad9
>   2.34%  libfreeblpriv3.so             [.] 0x00000000000355e6
>   2.33%  libpthread-2.12.so            [.] pthread_mutex_unlock
>   2.19%  libpthread-2.12.so            [.] pthread_mutex_lock
>   1.80%  libc-2.12.so                  [.] malloc
>   1.43%  [kernel]                      [k] do_raw_spin_lock
>   1.42%  libc-2.12.so                  [.] memcpy
>   1.23%  [kernel]                      [k] __switch_to
>   1.19%  [kernel]                      [k] acpi_processor_ffh_cstate_enter
>   1.09%  libc-2.12.so                  [.] malloc_consolidate
>   1.08%  [kernel]                      [k] __schedule
>   1.05%  libtcmalloc.so.4.1.0          [.] 0x0000000000017e6f
>   0.98%  libc-2.12.so                  [.] vfprintf
>   0.83%  libstdc++.so.6.0.13           [.] std::basic_ostream<char, 
> std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> 
> >(std::basic_ostream<char,
>   0.76%  libstdc++.so.6.0.13           [.] 0x000000000008092a
>   0.73%  libc-2.12.so                  [.] __memset_sse2
>   0.72%  libc-2.12.so                  [.] __strlen_sse42
>   0.70%  libstdc++.so.6.0.13           [.] std::basic_streambuf<char, 
> std::char_traits<char> >::xsputn(char const*, long)
>   0.68%  libpthread-2.12.so            [.] pthread_mutex_trylock
>   0.67%  librados.so.2.0.0             [.] ceph_crc32c_sctp
>   0.63%  libpython2.6.so.1.0           [.] 0x000000000007d823
>   0.55%  libnss3.so                    [.] 0x0000000000056d2a
>   0.52%  libc-2.12.so                  [.] free
>   0.50%  libstdc++.so.6.0.13           [.] std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> >::basic_string(std::string 
> const&)
> 
> should I check anything else?
> BR
> nik
> 
> 
> > 
> > Thanks & Regards
> > Somnath
> > 
> > -----Original Message-----
> > From: ceph-users [mailto:[email protected]] On Behalf Of 
> > Nikola Ciprich
> > Sent: Friday, April 24, 2015 7:10 AM
> > To: [email protected]
> > Cc: [email protected]
> > Subject: [ceph-users] very different performance on two volumes in the same 
> > pool
> > 
> > Hello,
> > 
> > I'm trying to solve a bit mysterious situation:
> > 
> > I've got 3 nodes CEPH cluster, and pool made of 3 OSDs (each on one node), 
> > OSDs are 1TB SSD drives.
> > 
> > pool has 3 replicas set. I'm measuring random IO performance using fio:
> > 
> > fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test 
> > --pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 
> > --readwrite=randread --output=randio.log
> > 
> > it's giving very nice performance of ~ 186K IOPS for random read.
> > 
> > the problem is, I've got one volume on which it fives only ~20K IOPS and I 
> > can't figure why. It's created using python, so I first suspected it can be 
> > similar to missing layerign problem I was consulting here few days ago, but 
> > when I tried reproducing it, I'm beting ~180K IOPS even for another volumes 
> > created using python.
> > 
> > so there is only this one problematic, others are fine. Since there is only 
> > one SSD in each box and I'm using 3 replicas, there should not be any 
> > difference in physical storage used between volumes..
> > 
> > I'm using hammer, 0.94.1, fio 2.2.6.
> > 
> > here's RBD info:
> > 
> > "slow" volume:
> > 
> > [root@vfnphav1a fio]# rbd info ssd3r/vmtst23-6 rbd image 'vmtst23-6':
> >     size 30720 MB in 7680 objects
> >     order 22 (4096 kB objects)
> >     block_name_prefix: rbd_data.1376d82ae8944a
> >     format: 2
> >     features:
> >     flags:
> > 
> > "fast" volume:
> > [root@vfnphav1a fio]# rbd info ssd3r/vmtst23-7 rbd image 'vmtst23-7':
> >     size 30720 MB in 7680 objects
> >     order 22 (4096 kB objects)
> >     block_name_prefix: rbd_data.13d01d2ae8944a
> >     format: 2
> >     features:
> >     flags:
> > 
> > any idea on what could be wrong here?
> > 
> > thanks a lot in advance!
> > 
> > BR
> > 
> > nik
> > 
> > --
> > -------------------------------------
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:    +420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> > 
> > mobil servis: +420 737 238 656
> > email servis: [email protected]
> > -------------------------------------
> > 
> > ________________________________
> > 
> > PLEASE NOTE: The information contained in this electronic mail message is 
> > intended only for the use of the designated recipient(s) named above. If 
> > the reader of this message is not the intended recipient, you are hereby 
> > notified that you have received this message in error and that any review, 
> > dissemination, distribution, or copying of this message is strictly 
> > prohibited. If you have received this communication in error, please notify 
> > the sender by telephone or e-mail (as shown above) immediately and destroy 
> > any and all copies of this message in your possession (whether hard copies 
> > or electronically stored copies).
> > 
> > 
> 
> -- 
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> 
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: [email protected]
> -------------------------------------
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------

pgpBEbkAm2Igs.pgp
Description: PGP signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] very different performance on two volumes in the same pool

Reply via email to