On Fri, Oct 25, 2019 at 11:14 PM Maged Mokhtar <mmokh...@petasan.org> wrote: > 3. vmotion between Ceph datastore and an external datastore..this will be > bad. This seems the case you are testing. It is bad because between 2 > different storage systems (iqns are served on different targets), vaai xcopy > cannot be used and vmware does its own stuff. It moves data using 64k block > size, which gives low performance...to add some flavor, it does indeed use 32 > threads, but unfortunately they use co-located addresses which does not work > well in Ceph as they are hitting the same rbd object, which gets serialized > due to pg locks, so you will not get any palatalization. Your speed will > mostly be determined by a serial 64k, so with 1 ms write latency for ssd > cluster, you will get around 64 MB/s..it will be slightly higher as the extra > threads have some low effect.
Yes, vmotion is the worst IO pattern ever for a sequential copy. However, the situation you are describing can be fixed with RBD striping v2, just make Ceph switch to another object every 64kb, see https://docs.ceph.com/docs/master/dev/file-striping/ I'm not sure about the state of striping v2 support in the kernel module, last time I checked it wasn't supported. But ceph-iscsi/tcmu-runner got quite good over the past year, I don't see any point in still using the kernel data path for iscsi nowadays. Paul > > Note your esxtop does show 32 active ios under ACTV, the QUED of zero does is > not the queue depth, but rather the "queued" io the ESX would suspend in case > your active reaches the maximum by adapater ( 128 ). > > This is just to clarify, if case 3 is not your primary concern than i would > forget about it and benchmark 1 and 2 if they are relevant. Else, if 3 is > important, i am not sure you can do much as it is happening within > vmware..maybe there could be a way to map the external iqn to be served by > the same target serving the Ceph iqn then there could be a chance the xcopy > could be activated..Mike would probably know if this has any chance of > working :) > > /Maged > > > On 25/10/2019 22:01, Ryan wrote: > > esxtop is showing a queue length of 0 > > Storage motion to ceph > DEVICE PATH/WORLD/PARTITION DQLEN WQLEN ACTV > QUED %USD LOAD CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd > KAVG/cmd GAVG/cmd QAVG/cmd > naa.6001405ec60d8b82342404d929fbbd03 - 128 - 32 > 0 25 0.25 1442.32 0.18 1440.50 0.00 89.78 21.32 0.01 > 21.34 0.01 > > Storage motion from ceph > DEVICE PATH/WORLD/PARTITION DQLEN WQLEN ACTV > QUED %USD LOAD CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd > KAVG/cmd GAVG/cmd QAVG/cmd > naa.6001405ec60d8b82342404d929fbbd03 - 128 - 32 > 0 25 0.25 4065.38 4064.83 0.36 253.52 0.00 7.57 0.01 > 7.58 0.00 > > I tried using fio like you mentioned but it was hanging with > [r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS] and the ETA kept climbing. I ended up using > rbd bench on the ceph iscsi gateway. With a 64K write workload I'm seeing > 400MB/s transfers. > > rbd create test --size 100G --image-feature layering > rbd map test > mkfs.ext4 /dev/rbd/rbd/test > mount /dev/rbd/rbd/test test > > rbd create testec --size 100G --image-feature layering --data-pool rbd_ec > rbd map testec > mkfs.ext4 /dev/rbd/rbd/testec > mount /dev/rbd/rbd/testec testec > > [root@ceph-iscsi1 mnt]# rbd bench --image test --io-size 64K --io-type write > --io-total 10G > bench type write io_size 65536 io_threads 16 bytes 10737418240 pattern > sequential > SEC OPS OPS/SEC BYTES/SEC > 1 6368 6377.59 417961796.64 > 2 12928 6462.27 423511630.71 > 3 19296 6420.18 420752986.78 > 4 26320 6585.61 431594792.67 > 5 33296 6662.37 436624891.04 > 6 40128 6754.67 442673957.25 > 7 46784 6765.75 443400452.26 > 8 53280 6809.02 446236110.93 > 9 60032 6739.67 441691068.73 > 10 66784 6698.91 439019550.77 > 11 73616 6690.88 438493253.66 > 12 80016 6654.35 436099640.00 > 13 85712 6485.07 425005611.11 > 14 91088 6202.49 406486113.46 > 15 96896 6021.17 394603137.62 > 16 102368 5741.19 376254347.24 > 17 107568 5501.57 360550910.38 > 18 113728 5603.17 367209502.58 > 19 120144 5820.48 381451245.32 > 20 126496 5917.60 387816078.53 > 21 132768 6089.71 399095466.00 > 22 139040 6306.98 413334431.09 > 23 145104 6276.42 411331743.63 > 24 151440 6256.67 410036891.68 > 25 157808 6261.12 410328554.98 > 26 163456 6140.03 402392725.36 > elapsed: 26 ops: 163840 ops/sec: 6271.36 bytes/sec: 410999626.38 > > [root@ceph-iscsi1 mnt]# rbd bench --image testec --io-size 64K --io-type > write --io-total 10G > bench type write io_size 65536 io_threads 16 bytes 10737418240 pattern > sequential > SEC OPS OPS/SEC BYTES/SEC > 1 7392 7415.38 485974266.41 > 2 14464 7243.59 474715656.29 > 3 22000 7341.08 481104853.50 > 4 29408 7352.29 481839517.16 > 5 37296 7459.38 488857889.75 > 6 44864 7494.36 491150574.57 > 7 52848 7676.76 503104281.98 > 8 60784 7756.76 508347136.11 > 9 68608 7835.26 513491609.52 > 10 76784 7902.30 517885290.67 > 11 84544 7935.96 520091129.45 > 12 92432 7916.76 518832844.57 > 13 100064 7855.96 514848275.43 > 14 107040 7692.52 504136734.09 > 15 114320 7499.66 491497933.56 > 16 121744 7436.99 487390477.85 > 17 129664 7438.92 487517345.01 > 18 136704 7326.50 480149408.39 > 19 144960 7587.00 497221460.09 > 20 153264 7796.56 510955233.33 > 21 160832 7814.44 512126854.90 > elapsed: 21 ops: 163840 ops/sec: 7659.97 bytes/sec: 502004079.43 > > On Fri, Oct 25, 2019 at 11:54 AM Mike Christie <mchri...@redhat.com> wrote: >> >> On 10/24/2019 11:47 PM, Ryan wrote: >> > I'm using CentOS 7.7.1908 with kernel 3.10.0-1062.1.2.el7.x86_64. The >> > workload was a VMware Storage Motion from a local SSD backed datastore >> >> Ignore my comments. I thought you were just doing fio like tests in the vm. >> >> > to the ceph backed datastore. Performance was measured using dstat on >> > the iscsi gateway for network traffic and ceph status as this cluster is >> > basically idle. I changed max_data_area_mb to 256 and cmdsn_depth to >> > 128. This appears to have given a slight improvement of maybe 10MB/s. >> > >> > Moving VM to the ceph backed datastore >> > io: >> > client: 124 KiB/s rd, 76 MiB/s wr, 95 op/s rd, 1.26k op/s wr >> > >> > Moving VM off the ceph backed datastore >> > io: >> > client: 344 MiB/s rd, 625 KiB/s wr, 5.54k op/s rd, 62 op/s wr >> > >> >> If you run esxtop while running your test what do you see for the number >> of commands in the iscsi LUN's queue? >> >> > I'm going to test bonnie++ with an rbd volume mounted directly on the >> >> To try and isolate if its the iscsi or rbd, you need to run fio with the >> librbd io engine. We know krbd is going to be the fastest. ceph-iscsi >> uses librbd so it is a better baseline. If you are not familiar with fio >> you can just do something like: >> >> fio --group_reporting --ioengine=rbd --direct=1 --name=librbdtest >> --numjobs=32 --bs=512k --iodepth=128 --size=10G --rw=write >> --rbd=name_of_your_image -pool=name_of_pool >> >> >> > iscsi gateway. Also will test bonnie++ inside a VM on a ceph backed >> > datastore. >> > >> > On Thu, Oct 24, 2019 at 7:15 PM Mike Christie <mchri...@redhat.com >> > <mailto:mchri...@redhat.com>> wrote: >> > >> > On 10/24/2019 12:22 PM, Ryan wrote: >> > > I'm in the process of testing the iscsi target feature of ceph. The >> > > cluster is running ceph 14.2.4 and ceph-iscsi 3.3. It consists of 5 >> > >> > What kernel are you using? >> > >> > > hosts with 12 SSD OSDs per host. Some basic testing moving VMs to >> > a ceph >> > > backed datastore is only showing 60MB/s transfers. However moving >> > these >> > > back off the datastore is fast at 200-300MB/s. >> > >> > What is the workload and what are you using to measure the throughput? >> > >> > If you are using fio, what arguments are you using? And, could you >> > change the ioengine to rbd and re-run the test from the target system >> > so >> > we can check if rbd is slow or iscsi? >> > >> > For small IOs, 60 is about right. >> > >> > For 128-512K IOs you should be able to get around 300 MB/s for writes >> > and 600 for reads. >> > >> > 1. Increase max_data_area_mb. This is a kernel buffer lio/tcmu uses to >> > pass data between the kernel and tcmu-runner. The default is only 8MB. >> > >> > In gwcli cd to your disk and do: >> > >> > # reconfigure max_data_area_mb %N >> > >> > where N is between 8 and 2048 MBs. >> > >> > 2. The Linux kernel target only allows 64 commands per iscsi session by >> > default. We increase that to 128, but you can increase this to 512. >> > >> > In gwcli cd to the target dir and do >> > >> > reconfigure cmdsn_depth 512 >> > >> > 3. I think ceph-iscsi and lio work better with higher queue depths so >> > if >> > you are using fio you want higher numjobs and/or iodepths. >> > >> > > >> > > What should I be looking at to track down the write performance >> > issue? >> > > In comparison with the Nimble Storage arrays I can see 200-300MB/s in >> > > both directions. >> > > >> > > Thanks, >> > > Ryan >> > > >> > > >> > > _______________________________________________ >> > > ceph-users mailing list -- ceph-users@ceph.io >> > <mailto:ceph-users@ceph.io> >> > > To unsubscribe send an email to ceph-users-le...@ceph.io >> > <mailto:ceph-users-le...@ceph.io> >> > > >> > >> > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io