Because you do not have segregated networks, the cluster traffic is most likely 
drowning out the FIO user traffic.  This is especially exacerbated by the fact 
that it is only a 1gb link between the cluster nodes.

If you are planning on using this cluster for anything other than testing, 
you’ll want to re-evaluate your network architecture.

+  >= 10gbe
+ Dedicated cluster network


From: Mario Rodríguez Molins <[email protected]>
Date: Wednesday, October 5, 2016 at 8:38 AM
To: "Will.Boege" <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: [EXTERNAL] [ceph-users] Benchmarks using fio tool gets stuck

Hi,

Currently, we do not have a separated cluster network and our setup is:
 - 3 nodes for OSD with 1Gbps links. Each node is running a unique OSD daemon. 
Although we plan to increase the number of OSDs per host.
 - 3 virtual machines also with 1Gbps links, where each vm is running one 
monitor daemon (two of them are running a metadata server too).
 - The two clients used for testing purposes are also 2 vms.

In each run of FIO tool, we do the following steps (all of them in the client):
 1.- Create an rbd image of 1Gb within a pool and map this image to a block 
device
 2.- Create the ext4 filesystem in this block device
 3.- Unmap the device from the client
 4.- Before testing, drop caches (echo 3 | tee /proc/sys/vm/drop_caches && sync)
 5.- Perform the fio test, setting the pool and name of the rbd image. In each 
run, the block size used is changed.
 6.- Remove the image from the pool



Thanks in advance!

On Wed, Oct 5, 2016 at 2:57 PM, Will.Boege 
<[email protected]<mailto:[email protected]>> wrote:
What does your network setup look like?  Do you have a separate cluster network?

Can you explain how you are performing the FIO test? Are you mounting a volume 
through krbd and testing that from a different server?

On Oct 5, 2016, at 3:11 AM, Mario Rodríguez Molins 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

We are setting a new cluster of Ceph and doing some benchmarks on it.
At this moment, our cluster consists of:
 - 3 nodes for OSD. In our current configuration one daemon per node.
 - 3 nodes for monitors (MON). In two of these nodes, there is a metadata 
server (MDS).

Benchmarks are performed with tools that ceph/rados provides us as well as with 
fio benchmark tool.
Our benchmark tests are based on this tutorial: 
http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance.

Using fio benchmark tool, we are having some issues. After some executions, the 
fio process gets stuck with futex_wait_queue_me call:
# cat /proc/14413/stack
[<ffffffffa7af6622>] futex_wait_queue_me+0xd2/0x140
[<ffffffffa7af74bf>] futex_wait+0xff/0x260
[<ffffffffa7aa3a6d>] wake_up_q+0x2d/0x60
[<ffffffffa7af7d11>] futex_requeue+0x2c1/0x930
[<ffffffffa7af8fd1>] do_futex+0x2b1/0xb20
[<ffffffffa7badfb1>] handle_mm_fault+0x14e1/0x1cd0
[<ffffffffa7aa48e8>] wake_up_new_task+0x108/0x1a0
[<ffffffffa7af98c3>] SyS_futex+0x83/0x180
[<ffffffffa7a63981>] __do_page_fault+0x221/0x510
[<ffffffffa7fda736>] system_call_fast_compare_end+0xc/0x96
[<ffffffffffffffff>] 0xffffffffffffffff

Logs of osd and mon daemons do not show any information or error about what the 
problem could be.

Executing strace command to trace the execution of the fio process show the 
following:

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632809, {1475609725, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 98347}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 345690227}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632811, {1475609725, 348199000}, ffffffff <unfinished ...>
[pid 14429] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid 14429] clock_gettime(CLOCK_REALTIME, {1475609725, 127563261}) = 0
[pid 14429] futex(0x7cefc8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14429] futex(0x7cf01c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
79103, {1475609727, 127563261}, ffffffff <unfinished ...>
[pid 14416] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 348403}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 595788486}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632813, {1475609725, 598199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 598360}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 845712817}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632815, {1475609725, 848199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 848353}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 95705677}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632817, {1475609726, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609726, 98359}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 345711731}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632819, {1475609726, 348199000}, ffffffff <unfinished ...>
[pid 14418] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid 14418] futex(0x7c1f08, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14418] clock_gettime(CLOCK_REALTIME, {1475609726, 103526543}) = 0
[pid 14418] futex(0x7c1f5c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
31641, {1475609731, 103526543}, ffffffff <unfinished ...>
[pid 14419] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
....

[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730557149}) = 0
[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730727417}) = 0
[pid 14423] futex(0x7c8c34, FUTEX_CMP_REQUEUE_PRIVATE, 1, 
2147483647<tel:2147483647>, 0x7c8b60, 15902 <unfinished ...>
[pid 14425] <... futex resumed> )       = 0
[pid 14425] futex(0x7c8b60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 14423] <... futex resumed> )       = 1
[pid 14423] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 14425] <... futex resumed> )       = 0
[pid 14425] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14425] clock_gettime(CLOCK_REALTIME, {1475609728, 731160249}) = 0
[pid 14425] sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
{"\200\4\364W\271\236\224+", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) 
= 9
[pid 14425] futex(0x7c8c34, FUTEX_WAIT_PRIVATE, 15903, NULL <unfinished ...>
[pid 14423] <... futex resumed> )       = 1
[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 731811246}) = 0
[pid 14423] futex(0x775430, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14423] futex(0x775494, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
15823, {1475609738, 731811246}, ffffffff <unfinished ...>
[pid 14426] <... restart_syscall resumed> ) = 1
[pid 14426] recvfrom(3, "\17\200\4\364W\271\236\224+", 4096, MSG_DONTWAIT, 
NULL, NULL) = 9
[pid 14426] clock_gettime(CLOCK_REALTIME, {1475609728, 732608460}) = 0
[pid 14426] poll([{fd=3, events=POLLIN|0x2000}], 1, 900000 <unfinished ...>
[pid 14417] <... futex resumed> )       = 0
[pid 14417] futex(0x771e28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14417] futex(0x771eac, FUTEX_WAIT_PRIVATE, 32223, NULL <unfinished ...>
[pid 14416] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)


This issue has appeared in our two clients. These two clients are running 
Debian Jessie, each one with a different kernel:
 - kernel 3.16.7-ckt25-2+deb8u3
 - kernel 4.7.2-1~bpo8+1
And the following version of the packages have been used in both clients:
- Ceph cluster 10.2.2 & FIO 2.1.11-2
- Ceph cluster 10.2.3 & FIO 2.1.11-2
- Ceph cluster 10.2.3 & FIO 2.14

We launch fio tool varying different settings such block size and operation 
type.
This is a simplified snippet of the shell script used:

for operation in read write randread randwrite; do
  for rbd in 4K 64K 1M 4M; do
    for bs in 4k 64k 1M 4M ; do
      # create rbd image with block size $rbd
      # drop caches

      fio --name=global \
      --ioengine=rbd \
      --clientname=admin \
      --pool=scbench \
      --rbdname=image01 \
      --bs=${bs} \
      --name=rbd_iodeph32 \
      --iodepth=32 \
      --rw=${operation} \
      --output-format=json

      sleep 10
      # delete rbd image
    done
  done
done



Any ideas why it could be happening ? Are we missing some settings in fio tool ?

Regards,


--
[mage removed by sender.]
Mario Rodríguez
SRE
[email protected]<mailto:[email protected]>

+34 914 294 039<tel:%2B34%20914%20294%20039> — 645 756 437
C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
Tuenti Technologies, S.L.
www.tuenti.com<http://www.tuenti.com/>
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
[mage removed by sender.]
Mario Rodríguez
SRE
[email protected]<mailto:[email protected]>

+34 914 294 039 — 645 756 437
C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
Tuenti Technologies, S.L.
www.tuenti.com<http://www.tuenti.com/>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to