Re: [ceph-users] very different performance on two volumes in the same pool #2

Somnath Roy Sun, 10 May 2015 22:21:07 -0700

Two things..

1. You should always use SSD drives for benchmarking after preconditioning it.


2. After creating and mapping rbd lun, you need to write data first to read it 
afterword otherwise fio output will be misleading. In fact, I think you will 
see IO is not even hitting cluster (check with ceph -s)

Now, if you are saying it's a 3 OSD setup, yes, ~23K is pretty low. Check the 
following.

1. Check client or OSd node cpu is saturating or not.

2. With 4K, hope network BW is fine

3. Number of PGs/pool should be ~128 or so.

4. If you are using krbd, you might want to try latest krbd module where 
TCP_NODELAY problem is fixed. If you don't want that complexity, try with 
fio-rbd.

Hope this helps,

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nikola 
Ciprich
Sent: Sunday, May 10, 2015 9:43 PM
To: ceph-users
Cc: n...@linuxbox.cz
Subject: [ceph-users] very different performance on two volumes in the same 
pool #2

Hello ceph developers and users,

some time ago, I posted here a question regarding very different performance 
for two volumes in one pool (backed by SSD drives).

After some examination, I probably got to the root of the problem..

When I create fresh volume (ie rbd create --image-format 2 --size 51200 
ssd/test) and run random io fio benchmark

fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test 
--pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 
--readwrite=randread

I get very nice performance of up to 200k IOPS. However once the volume is 
written to (ie when I map it using rbd map and dd whole volume with some random 
data), and repeat the benchmark, random performance drops to ~23k IOPS.

This leads me to conjecture that for unwritten (sparse) volumes, read is just a 
noop, simply returning zeroes without really having to read data from physical 
storage, and thus showing nice performance, but once the volume is written, 
performance drops due to need to physically read the data, right?

However I'm a bit unhappy about the performance drop, the pool is backed by 3 
SSD drives (each having random io performance of 100k iops) on three nodes, and 
object size is set to 3. Cluster is completely idle, nodes are quad core Xeons 
E3-1220 v3 @ 3.10GHz, 32GB RAM each, centos 6, kernel 3.18.12, ceph 0.94.1. I'm 
using libtcmalloc (I even tried upgrading gperftools-libs to 2.4) Nodes are 
connected using 10gb ethernet, with jumbo frames enabled.


I tried tuning following values:

osd_op_threads = 5
filestore_op_threads = 4
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 25
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32

I don't see anything special in perf:

  5.43%  [kernel]              [k] acpi_processor_ffh_cstate_enter
  2.93%  libtcmalloc.so.4.2.6  [.] 0x0000000000017d2c
  2.45%  libpthread-2.12.so    [.] pthread_mutex_lock
  2.37%  libpthread-2.12.so    [.] pthread_mutex_unlock
  2.33%  [kernel]              [k] do_raw_spin_lock
  2.00%  libsoftokn3.so        [.] 0x000000000001f455
  1.96%  [kernel]              [k] __switch_to
  1.32%  [kernel]              [k] __schedule
  1.24%  libstdc++.so.6.0.13   [.] std::basic_ostream<char, 
std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> 
>(std::basic_ostream<char, std::char
  1.24%  libc-2.12.so          [.] memcpy
  1.19%  libtcmalloc.so.4.2.6  [.] operator delete(void*)
  1.16%  [kernel]              [k] __d_lookup_rcu
  1.09%  libstdc++.so.6.0.13   [.] 0x000000000007d6be
  0.93%  libstdc++.so.6.0.13   [.] std::basic_streambuf<char, 
std::char_traits<char> >::xsputn(char const*, long)
  0.93%  ceph-osd              [.] crush_hash32_3
  0.85%  libc-2.12.so          [.] vfprintf
  0.84%  libc-2.12.so          [.] __strlen_sse42
  0.80%  [kernel]              [k] get_futex_key_refs
  0.80%  libpthread-2.12.so    [.] pthread_mutex_trylock
  0.78%  libtcmalloc.so.4.2.6  [.] 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long, int)
  0.71%  libstdc++.so.6.0.13   [.] std::basic_string<char, 
std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
  0.68%  ceph-osd              [.] ceph::log::Log::flush()
  0.66%  libtcmalloc.so.4.2.6  [.] tc_free
  0.63%  [kernel]              [k] resched_curr
  0.63%  [kernel]              [k] page_fault
  0.62%  libstdc++.so.6.0.13   [.] std::string::reserve(unsigned long)

I'm running benchmark directly on one of nodes, which I know is not optimal, 
but it's still able to give those 200k iops for empty volume, so I guess it 
shouldn't be problem..

Another story is random write performance, which is totally poor, but I't like 
to deal with read performance first..


so my question is, are those numbers normal? If not, what should I check?

I'll be very grateful for all the hints I could get..

thanks a lot in advance

nik


--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-------------------------------------

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] very different performance on two volumes in the same pool #2

Reply via email to