[ceph-users] Re: iSCSI write performance

Paul Emmerich Thu, 31 Oct 2019 09:47:29 -0700

On Fri, Oct 25, 2019 at 11:14 PM Maged Mokhtar <mmokh...@petasan.org> wrote:
> 3. vmotion between Ceph datastore and an external datastore..this will be 
> bad. This seems the case you are testing. It is bad because between 2 
> different storage systems (iqns are served on different targets), vaai xcopy 
> cannot be used and vmware does its own stuff. It moves data using 64k block 
> size, which gives low performance...to add some flavor, it does indeed use 32 
> threads, but unfortunately they use co-located addresses which does not work 
> well in Ceph as they are hitting the same rbd object, which gets serialized 
> due to pg locks, so you will not get any palatalization. Your speed will 
> mostly be determined by a serial 64k, so with 1 ms write latency for ssd 
> cluster, you will get around 64 MB/s..it will be slightly higher as the extra 
> threads have some low effect.


Yes, vmotion is the worst IO pattern ever for a sequential copy.

However, the situation you are describing can be fixed with RBD
striping v2, just make Ceph switch to another object every 64kb, see
https://docs.ceph.com/docs/master/dev/file-striping/

I'm not sure about the state of striping v2 support in the kernel
module, last time I checked it wasn't supported. But
ceph-iscsi/tcmu-runner got quite good over the past year, I don't see
any point in still using the kernel data path for iscsi nowadays.


Paul


>
> Note your esxtop does show 32 active ios under ACTV, the QUED of zero does is 
> not the queue depth, but rather the "queued" io the ESX would suspend in case 
> your active reaches the maximum by adapater ( 128 ).
>
> This is just to clarify, if case 3 is not your primary concern than i would 
> forget about it and benchmark 1 and 2 if they are relevant. Else, if 3 is 
> important, i am not sure you can do much as it is happening within 
> vmware..maybe there could be a way to map the external iqn to be served by 
> the same target serving the Ceph iqn then there could be a chance the xcopy 
> could be activated..Mike would probably know if this has any chance of 
> working :)
>
> /Maged
>
>
> On 25/10/2019 22:01, Ryan wrote:
>
> esxtop is showing a queue length of 0
>
> Storage motion to ceph
> DEVICE                                PATH/WORLD/PARTITION DQLEN WQLEN ACTV 
> QUED %USD  LOAD   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd 
> KAVG/cmd GAVG/cmd QAVG/cmd
> naa.6001405ec60d8b82342404d929fbbd03           -             128     -   32   
>  0   25  0.25  1442.32     0.18  1440.50     0.00    89.78    21.32     0.01  
>   21.34     0.01
>
> Storage motion from ceph
> DEVICE                                PATH/WORLD/PARTITION DQLEN WQLEN ACTV 
> QUED %USD  LOAD   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd 
> KAVG/cmd GAVG/cmd QAVG/cmd
> naa.6001405ec60d8b82342404d929fbbd03           -             128     -   32   
>  0   25  0.25  4065.38  4064.83     0.36   253.52     0.00     7.57     0.01  
>    7.58     0.00
>
> I tried using fio like you mentioned but it was hanging with 
> [r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS] and the ETA kept climbing. I ended up using 
> rbd bench on the ceph iscsi gateway. With a 64K write workload I'm seeing 
> 400MB/s transfers.
>
> rbd create test --size 100G --image-feature layering
> rbd map test
> mkfs.ext4 /dev/rbd/rbd/test
> mount /dev/rbd/rbd/test test
>
> rbd create testec --size 100G --image-feature layering --data-pool rbd_ec
> rbd map testec
> mkfs.ext4 /dev/rbd/rbd/testec
> mount /dev/rbd/rbd/testec testec
>
> [root@ceph-iscsi1 mnt]# rbd bench --image test --io-size 64K --io-type write 
> --io-total 10G
> bench  type write io_size 65536 io_threads 16 bytes 10737418240 pattern 
> sequential
>   SEC       OPS   OPS/SEC   BYTES/SEC
>     1      6368   6377.59  417961796.64
>     2     12928   6462.27  423511630.71
>     3     19296   6420.18  420752986.78
>     4     26320   6585.61  431594792.67
>     5     33296   6662.37  436624891.04
>     6     40128   6754.67  442673957.25
>     7     46784   6765.75  443400452.26
>     8     53280   6809.02  446236110.93
>     9     60032   6739.67  441691068.73
>    10     66784   6698.91  439019550.77
>    11     73616   6690.88  438493253.66
>    12     80016   6654.35  436099640.00
>    13     85712   6485.07  425005611.11
>    14     91088   6202.49  406486113.46
>    15     96896   6021.17  394603137.62
>    16    102368   5741.19  376254347.24
>    17    107568   5501.57  360550910.38
>    18    113728   5603.17  367209502.58
>    19    120144   5820.48  381451245.32
>    20    126496   5917.60  387816078.53
>    21    132768   6089.71  399095466.00
>    22    139040   6306.98  413334431.09
>    23    145104   6276.42  411331743.63
>    24    151440   6256.67  410036891.68
>    25    157808   6261.12  410328554.98
>    26    163456   6140.03  402392725.36
> elapsed:    26  ops:   163840  ops/sec:  6271.36  bytes/sec: 410999626.38
>
> [root@ceph-iscsi1 mnt]# rbd bench --image testec --io-size 64K --io-type 
> write --io-total 10G
> bench  type write io_size 65536 io_threads 16 bytes 10737418240 pattern 
> sequential
>   SEC       OPS   OPS/SEC   BYTES/SEC
>     1      7392   7415.38  485974266.41
>     2     14464   7243.59  474715656.29
>     3     22000   7341.08  481104853.50
>     4     29408   7352.29  481839517.16
>     5     37296   7459.38  488857889.75
>     6     44864   7494.36  491150574.57
>     7     52848   7676.76  503104281.98
>     8     60784   7756.76  508347136.11
>     9     68608   7835.26  513491609.52
>    10     76784   7902.30  517885290.67
>    11     84544   7935.96  520091129.45
>    12     92432   7916.76  518832844.57
>    13    100064   7855.96  514848275.43
>    14    107040   7692.52  504136734.09
>    15    114320   7499.66  491497933.56
>    16    121744   7436.99  487390477.85
>    17    129664   7438.92  487517345.01
>    18    136704   7326.50  480149408.39
>    19    144960   7587.00  497221460.09
>    20    153264   7796.56  510955233.33
>    21    160832   7814.44  512126854.90
> elapsed:    21  ops:   163840  ops/sec:  7659.97  bytes/sec: 502004079.43
>
> On Fri, Oct 25, 2019 at 11:54 AM Mike Christie <mchri...@redhat.com> wrote:
>>
>> On 10/24/2019 11:47 PM, Ryan wrote:
>> > I'm using CentOS 7.7.1908 with kernel 3.10.0-1062.1.2.el7.x86_64. The
>> > workload was a VMware Storage Motion from a local SSD backed datastore
>>
>> Ignore my comments. I thought you were just doing fio like tests in the vm.
>>
>> > to the ceph backed datastore. Performance was measured using dstat on
>> > the iscsi gateway for network traffic and ceph status as this cluster is
>> > basically idle.  I changed max_data_area_mb to 256 and cmdsn_depth to
>> > 128. This appears to have given a slight improvement of maybe 10MB/s.
>> >
>> > Moving VM to the ceph backed datastore
>> > io:
>> >     client:   124 KiB/s rd, 76 MiB/s wr, 95 op/s rd, 1.26k op/s wr
>> >
>> > Moving VM off the ceph backed datastore
>> >   io:
>> >     client:   344 MiB/s rd, 625 KiB/s wr, 5.54k op/s rd, 62 op/s wr
>> >
>>
>> If you run esxtop while running your test what do you see for the number
>> of commands in the iscsi LUN's queue?
>>
>> > I'm going to test bonnie++ with an rbd volume mounted directly on the
>>
>> To try and isolate if its the iscsi or rbd, you need to run fio with the
>> librbd io engine. We know krbd is going to be the fastest. ceph-iscsi
>> uses librbd so it is a better baseline. If you are not familiar with fio
>> you can just do something like:
>>
>> fio --group_reporting --ioengine=rbd --direct=1 --name=librbdtest
>> --numjobs=32 --bs=512k --iodepth=128 --size=10G  --rw=write
>> --rbd=name_of_your_image -pool=name_of_pool
>>
>>
>> > iscsi gateway. Also will test bonnie++ inside a VM on a ceph backed
>> > datastore.
>> >
>> > On Thu, Oct 24, 2019 at 7:15 PM Mike Christie <mchri...@redhat.com
>> > <mailto:mchri...@redhat.com>> wrote:
>> >
>> >     On 10/24/2019 12:22 PM, Ryan wrote:
>> >     > I'm in the process of testing the iscsi target feature of ceph. The
>> >     > cluster is running ceph 14.2.4 and ceph-iscsi 3.3. It consists of 5
>> >
>> >     What kernel are you using?
>> >
>> >     > hosts with 12 SSD OSDs per host. Some basic testing moving VMs to
>> >     a ceph
>> >     > backed datastore is only showing 60MB/s transfers. However moving
>> >     these
>> >     > back off the datastore is fast at 200-300MB/s.
>> >
>> >     What is the workload and what are you using to measure the throughput?
>> >
>> >     If you are using fio, what arguments are you using? And, could you
>> >     change the ioengine to rbd and re-run the test from the target system 
>> > so
>> >     we can check if rbd is slow or iscsi?
>> >
>> >     For small IOs, 60 is about right.
>> >
>> >     For 128-512K IOs you should be able to get around 300 MB/s for writes
>> >     and 600 for reads.
>> >
>> >     1. Increase max_data_area_mb. This is a kernel buffer lio/tcmu uses to
>> >     pass data between the kernel and tcmu-runner. The default is only 8MB.
>> >
>> >     In gwcli cd to your disk and do:
>> >
>> >     # reconfigure max_data_area_mb %N
>> >
>> >     where N is between 8 and 2048 MBs.
>> >
>> >     2. The Linux kernel target only allows 64 commands per iscsi session by
>> >     default. We increase that to 128, but you can increase this to 512.
>> >
>> >     In gwcli cd to the target dir and do
>> >
>> >     reconfigure cmdsn_depth 512
>> >
>> >     3. I think ceph-iscsi and lio work better with higher queue depths so 
>> > if
>> >     you are using fio you want higher numjobs and/or iodepths.
>> >
>> >     >
>> >     > What should I be looking at to track down the write performance 
>> > issue?
>> >     > In comparison with the Nimble Storage arrays I can see 200-300MB/s in
>> >     > both directions.
>> >     >
>> >     > Thanks,
>> >     > Ryan
>> >     >
>> >     >
>> >     > _______________________________________________
>> >     > ceph-users mailing list -- ceph-users@ceph.io
>> >     <mailto:ceph-users@ceph.io>
>> >     > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >     <mailto:ceph-users-le...@ceph.io>
>> >     >
>> >
>>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: iSCSI write performance

Reply via email to