Adding the parameter "--iodirect=1" to the fio command line, it does not
get stuck anymore.
This is how it looks now my script:
for operation in read write randread randwrite; do
for rbd in 4K 64K 1M 4M; do
for bs in 4k 64k 1M 4M ; do
# - create rbd image with block size $rbd
fio --name=global \
--ioengine=rbd \
--clientname=admin \
--pool=scbench \
--rbdname=image01 \
--exec_prerun="echo 3 > /proc/sys/vm/drop_caches && sync" \
--bs=${bs} \
--name=rbd_iodeph32 \
--iodepth=32 \
--direct=1 \
--rw=${operation} \
--output-format=json
sleep 10
# - delete rbd image
done
done
done
On Wed, Oct 5, 2016 at 5:09 PM, Mario Rodríguez Molins <
[email protected]> wrote:
> Doing some tests using iperf, our network has a bandwidth among nodes of
> 940 Mbits/sec.
> According to our metrics of network use in this cluster, hosts with OSD
> have a peek traffic of about 200 Mbits/sec each and the client which runs
> FIO about 300 Mbits/sec.
> It doesn't seem to be saturated the network.
>
>
>
>
>
> On Wed, Oct 5, 2016 at 4:16 PM, Will.Boege <[email protected]> wrote:
>
>> Because you do not have segregated networks, the cluster traffic is most
>> likely drowning out the FIO user traffic. This is especially exacerbated
>> by the fact that it is only a 1gb link between the cluster nodes.
>>
>>
>>
>> If you are planning on using this cluster for anything other than
>> testing, you’ll want to re-evaluate your network architecture.
>>
>>
>>
>> + >= 10gbe
>>
>> + Dedicated cluster network
>>
>>
>>
>>
>>
>> *From: *Mario Rodríguez Molins <[email protected]>
>> *Date: *Wednesday, October 5, 2016 at 8:38 AM
>> *To: *"Will.Boege" <[email protected]>
>> *Cc: *"[email protected]" <[email protected]>
>> *Subject: *Re: [EXTERNAL] [ceph-users] Benchmarks using fio tool gets
>> stuck
>>
>>
>>
>> Hi,
>>
>>
>>
>> Currently, we do not have a separated cluster network and our setup is:
>>
>> - 3 nodes for OSD with 1Gbps links. Each node is running a unique OSD
>> daemon. Although we plan to increase the number of OSDs per host.
>>
>> - 3 virtual machines also with 1Gbps links, where each vm is running one
>> monitor daemon (two of them are running a metadata server too).
>>
>> - The two clients used for testing purposes are also 2 vms.
>>
>>
>>
>> In each run of FIO tool, we do the following steps (all of them in the
>> client):
>>
>> 1.- Create an rbd image of 1Gb within a pool and map this image to a
>> block device
>>
>> 2.- Create the ext4 filesystem in this block device
>>
>> 3.- Unmap the device from the client
>>
>> 4.- Before testing, drop caches (echo 3 | tee /proc/sys/vm/drop_caches
>> && sync)
>>
>> 5.- Perform the fio test, setting the pool and name of the rbd image. In
>> each run, the block size used is changed.
>>
>> 6.- Remove the image from the pool
>>
>>
>>
>>
>>
>>
>>
>> Thanks in advance!
>>
>>
>>
>> On Wed, Oct 5, 2016 at 2:57 PM, Will.Boege <[email protected]> wrote:
>>
>> What does your network setup look like? Do you have a separate cluster
>> network?
>>
>>
>>
>> Can you explain how you are performing the FIO test? Are you mounting a
>> volume through krbd and testing that from a different server?
>>
>>
>> On Oct 5, 2016, at 3:11 AM, Mario Rodríguez Molins <
>> [email protected]> wrote:
>>
>> Hello,
>>
>>
>>
>> We are setting a new cluster of Ceph and doing some benchmarks on it.
>>
>> At this moment, our cluster consists of:
>>
>> - 3 nodes for OSD. In our current configuration one daemon per node.
>>
>> - 3 nodes for monitors (MON). In two of these nodes, there is a metadata
>> server (MDS).
>>
>>
>>
>> Benchmarks are performed with tools that ceph/rados provides us as well
>> as with fio benchmark tool.
>>
>> Our benchmark tests are based on this tutorial:
>> http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_
>> Cluster_Performance.
>>
>>
>>
>> Using fio benchmark tool, we are having some issues. After some
>> executions, the fio process gets stuck with futex_wait_queue_me call:
>>
>> # cat /proc/14413/stack
>>
>> [<ffffffffa7af6622>] futex_wait_queue_me+0xd2/0x140
>>
>> [<ffffffffa7af74bf>] futex_wait+0xff/0x260
>>
>> [<ffffffffa7aa3a6d>] wake_up_q+0x2d/0x60
>>
>> [<ffffffffa7af7d11>] futex_requeue+0x2c1/0x930
>>
>> [<ffffffffa7af8fd1>] do_futex+0x2b1/0xb20
>>
>> [<ffffffffa7badfb1>] handle_mm_fault+0x14e1/0x1cd0
>>
>> [<ffffffffa7aa48e8>] wake_up_new_task+0x108/0x1a0
>>
>> [<ffffffffa7af98c3>] SyS_futex+0x83/0x180
>>
>> [<ffffffffa7a63981>] __do_page_fault+0x221/0x510
>>
>> [<ffffffffa7fda736>] system_call_fast_compare_end+0xc/0x96
>>
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>>
>>
>> Logs of osd and mon daemons do not show any information or error about
>> what the problem could be.
>>
>>
>>
>> Executing strace command to trace the execution of the fio process show
>> the following:
>>
>>
>>
>> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
>> 632809, {1475609725, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>> [pid 14416] gettimeofday({1475609725, 98347}, NULL) = 0
>>
>> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>>
>> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 345690227}) = 0
>>
>> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
>> 632811, {1475609725, 348199000}, ffffffff <unfinished ...>
>>
>> [pid 14429] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>> [pid 14429] clock_gettime(CLOCK_REALTIME, {1475609725, 127563261}) = 0
>>
>> [pid 14429] futex(0x7cefc8, FUTEX_WAKE_PRIVATE, 1) = 0
>>
>> [pid 14429] futex(0x7cf01c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME,
>> 79103, {1475609727, 127563261}, ffffffff <unfinished ...>
>>
>> [pid 14416] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>> [pid 14416] gettimeofday({1475609725, 348403}, NULL) = 0
>>
>> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>>
>> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 595788486}) = 0
>>
>> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
>> 632813, {1475609725, 598199000}, ffffffff) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>> [pid 14416] gettimeofday({1475609725, 598360}, NULL) = 0
>>
>> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>>
>> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 845712817}) = 0
>>
>> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
>> 632815, {1475609725, 848199000}, ffffffff) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>> [pid 14416] gettimeofday({1475609725, 848353}, NULL) = 0
>>
>> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>>
>> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 95705677}) = 0
>>
>> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
>> 632817, {1475609726, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>> [pid 14416] gettimeofday({1475609726, 98359}, NULL) = 0
>>
>> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>>
>> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 345711731}) = 0
>>
>> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
>> 632819, {1475609726, 348199000}, ffffffff <unfinished ...>
>>
>> [pid 14418] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>> [pid 14418] futex(0x7c1f08, FUTEX_WAKE_PRIVATE, 1) = 0
>>
>> [pid 14418] clock_gettime(CLOCK_REALTIME, {1475609726, 103526543}) = 0
>>
>> [pid 14418] futex(0x7c1f5c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME,
>> 31641, {1475609731, 103526543}, ffffffff <unfinished ...>
>>
>> [pid 14419] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>> ....
>>
>>
>>
>> [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730557149}) = 0
>>
>> [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730727417}) = 0
>>
>> [pid 14423] futex(0x7c8c34, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647,
>> 0x7c8b60, 15902 <unfinished ...>
>>
>> [pid 14425] <... futex resumed> ) = 0
>>
>> [pid 14425] futex(0x7c8b60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>>
>> [pid 14423] <... futex resumed> ) = 1
>>
>> [pid 14423] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>
>> [pid 14425] <... futex resumed> ) = 0
>>
>> [pid 14425] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1) = 0
>>
>> [pid 14425] clock_gettime(CLOCK_REALTIME, {1475609728, 731160249}) = 0
>>
>> [pid 14425] sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1},
>> {"\200\4\364W\271\236\224+", 8}], msg_controllen=0, msg_flags=0},
>> MSG_NOSIGNAL) = 9
>>
>> [pid 14425] futex(0x7c8c34, FUTEX_WAIT_PRIVATE, 15903, NULL <unfinished
>> ...>
>>
>> [pid 14423] <... futex resumed> ) = 1
>>
>> [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 731811246}) = 0
>>
>> [pid 14423] futex(0x775430, FUTEX_WAKE_PRIVATE, 1) = 0
>>
>> [pid 14423] futex(0x775494, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME,
>> 15823, {1475609738, 731811246}, ffffffff <unfinished ...>
>>
>> [pid 14426] <... restart_syscall resumed> ) = 1
>>
>> [pid 14426] recvfrom(3, "\17\200\4\364W\271\236\224+", 4096,
>> MSG_DONTWAIT, NULL, NULL) = 9
>>
>> [pid 14426] clock_gettime(CLOCK_REALTIME, {1475609728, 732608460}) = 0
>>
>> [pid 14426] poll([{fd=3, events=POLLIN|0x2000}], 1, 900000 <unfinished
>> ...>
>>
>> [pid 14417] <... futex resumed> ) = 0
>>
>> [pid 14417] futex(0x771e28, FUTEX_WAKE_PRIVATE, 1) = 0
>>
>> [pid 14417] futex(0x771eac, FUTEX_WAIT_PRIVATE, 32223, NULL <unfinished
>> ...>
>>
>> [pid 14416] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed
>> out)
>>
>>
>>
>>
>>
>> This issue has appeared in our two clients. These two clients are running
>> Debian Jessie, each one with a different kernel:
>>
>> - kernel 3.16.7-ckt25-2+deb8u3
>> - kernel 4.7.2-1~bpo8+1
>>
>> And the following version of the packages have been used in both clients:
>>
>> - Ceph cluster 10.2.2 & FIO 2.1.11-2
>>
>> - Ceph cluster 10.2.3 & FIO 2.1.11-2
>>
>> - Ceph cluster 10.2.3 & FIO 2.14
>>
>>
>>
>> We launch fio tool varying different settings such block size and
>> operation type.
>> This is a simplified snippet of the shell script used:
>>
>>
>>
>> for operation in read write randread randwrite; do
>>
>>
>> for rbd in 4K 64K 1M 4M; do
>>
>> for bs in 4k 64k 1M 4M ; do
>>
>> # create rbd image with block size $rbd
>>
>> # drop caches
>>
>>
>>
>> fio --name=global \
>>
>> --ioengine=rbd \
>>
>> --clientname=admin \
>>
>> --pool=scbench \
>>
>> --rbdname=image01 \
>>
>> --bs=${bs} \
>>
>> --name=rbd_iodeph32 \
>> --iodepth=32 \
>>
>> --rw=${operation} \
>>
>> --output-format=json
>>
>>
>>
>> sleep 10
>> # delete rbd image
>>
>> done
>>
>> done
>>
>> done
>>
>>
>>
>>
>>
>>
>>
>> Any ideas why it could be happening ? Are we missing some settings in fio
>> tool ?
>>
>>
>>
>> Regards,
>>
>>
>>
>>
>>
>> --
>>
>> [image: mage removed by sender.]
>> *Mario Rodríguez*
>> SRE
>> [email protected]
>>
>> +34 914 294 039 — 645 756 437
>> C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
>> Tuenti Technologies, S.L.
>> www.tuenti.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>>
>> --
>>
>> [image: mage removed by sender.]
>> *Mario Rodríguez*
>> SRE
>> [email protected]
>>
>> +34 914 294 039 — 645 756 437
>> C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
>> Tuenti Technologies, S.L.
>> www.tuenti.com
>>
>
>
>
> --
>
> *Mario Rodríguez*
> SRE
> [email protected]
>
> +34 914 294 039 — 645 756 437
> C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
> Tuenti Technologies, S.L.
> www.tuenti.com
>
--
*Mario Rodríguez*
SRE
[email protected]
+34 914 294 039 — 645 756 437
C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
Tuenti Technologies, S.L.
www.tuenti.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com