Timur, read this thread: https://www.mail-archive.com/[email protected]/msg12486.html Тимур, прочитай эту ветку.
2014-10-01 16:24 GMT+04:00 Andrei Mikhailovsky <[email protected]>: > Timur, > > As far as I know, the latest master has a number of improvements for ssd > disks. If you check the mailing list discussion from a couple of weeks > back, you can see that the latest stable firefly is not that well optimised > for ssd drives and IO is limited. However changes are being made to address > that. > > I am well surprised that you can get 10K IOps as in my tests I was not > getting over 3K IOPs on the ssd disks which are capable of doing 90K IOps. > > P.S. does anyone know if the ssd optimisation code will be added to the > next maintenance release of firefy? > > Andrei > ------------------------------ > > *From: *"Timur Nurlygayanov" <[email protected]> > *To: *"Christian Balzer" <[email protected]> > *Cc: *[email protected] > *Sent: *Wednesday, 1 October, 2014 1:11:25 PM > *Subject: *Re: [ceph-users] Why performance of benchmarks with small > blocks is extremely small? > > > Hello Christian, > > Thank you for your detailed answer! > > I have other pre-production environment with 4 Ceph servers, 4 SSD disks > per Ceph server (each Ceph OSD node on the separate SSD disk) > Probably I should move journals to other disks or it is not required in my > case? > > [root@ceph-node ~]# mount | grep ceph > /dev/sdb4 on /var/lib/ceph/osd/ceph-0 type xfs > (rw,noexec,nodev,noatime,nodiratime,inode64,logbsize=256k,delaylog,user_xattr,data=writeback) > /dev/sde4 on /var/lib/ceph/osd/ceph-5 type xfs > (rw,noexec,nodev,noatime,nodiratime,inode64,logbsize=256k,delaylog,user_xattr,data=writeback) > /dev/sdd4 on /var/lib/ceph/osd/ceph-2 type xfs > (rw,noexec,nodev,noatime,nodiratime,inode64,logbsize=256k,delaylog,user_xattr,data=writeback) > /dev/sdc4 on /var/lib/ceph/osd/ceph-1 type xfs > (rw,noexec,nodev,noatime,nodiratime,inode64,logbsize=256k,delaylog,user_xattr,data=writeback) > > [root@ceph-node ~]# find /var/lib/ceph/osd/ | grep journal > /var/lib/ceph/osd/ceph-0/journal > /var/lib/ceph/osd/ceph-5/journal > /var/lib/ceph/osd/ceph-1/journal > /var/lib/ceph/osd/ceph-2/journal > > My SSD disks have ~ 40k IOPS per disk, but on the VM I can see only ~ 10k > - 14k IOPS for disks operations. > To check this I execute the following command on VM with root partition > mounted on disk in Ceph storage: > > root@test-io:/home/ubuntu# rm -rf /tmp/test && spew -d --write -r -b 4096 > 10M /tmp/test > WTR: 56506.22 KiB/s Transfer time: 00:00:00 IOPS: 14126.55 > > Is it expected result or I can improve the performance and get at least > 30k-40k IOPS on the VM disks? (I have 2x 10Gb/s networks interfaces in LACP > bonding for storage network, looks like network can't be the bottleneck). > > Thank you! > > > On Wed, Oct 1, 2014 at 6:50 AM, Christian Balzer <[email protected]> wrote: > >> >> Hello, >> >> [reduced to ceph-users] >> >> On Sat, 27 Sep 2014 19:17:22 +0400 Timur Nurlygayanov wrote: >> >> > Hello all, >> > >> > I installed OpenStack with Glance + Ceph OSD with replication factor 2 >> > and now I can see the write operations are extremly slow. >> > For example, I can see only 0.04 MB/s write speed when I run rados bench >> > with 512b blocks: >> > >> > rados bench -p test 60 write --no-cleanup -t 1 -b 512 >> > >> There are 2 things wrong with that this test: >> >> 1. You're using rados bench, when in fact you should be testing from >> within VMs. For starters a VM could make use of the rbd cache you enabled, >> rados bench won't. >> >> 2. Given the parameters of this test you're testing network latency more >> than anything else. If you monitor the Ceph nodes (atop is a good tool for >> that), you will probably see that neither CPU nor disks resources are >> being exhausted. With a single thread rados puts that tiny block of 512 >> bytes on the wire, the primary OSD for the PG has to write this to the >> journal (on your slow, non-SSD disks) and send it to the secondary OSD, >> which has to ACK the write to its journal back to the primary one, which >> in turn then ACKs it to the client (rados bench) and then rados bench can >> send the next packet. >> You get the drift. >> >> Using your parameters I can get 0.17MB/s on a pre-production cluster >> that uses 4xQDR Infiniband (IPoIB) connections, on my shitty test cluster >> with 1GB/s links I get similar results to you, unsurprisingly. >> >> Ceph excels only with lots of parallelism, so an individual thread might >> be slow (and in your case HAS to be slow, which has nothing to do with >> Ceph per se) but many parallel ones will utilize the resources available. >> >> Having data blocks that are adequately sized (4MB, the default rados size) >> will help for bandwidth and the rbd cache inside a properly configured VM >> should make that happen. >> >> Of course in most real life scenarios you will run out of IOPS long before >> you run out of bandwidth. >> >> >> > Maintaining 1 concurrent writes of 512 bytes for up to 60 seconds or 0 >> > objects >> > Object prefix: benchmark_data_node-17.domain.tld_15862 >> > sec Cur ops started finished avg MB/s cur MB/s last >> > lat avg lat >> > 0 0 0 0 0 >> > 0 - 0 >> > 1 1 83 82 0.0400341 0.0400391 >> > 0.008465 0.0120985 >> > 2 1 169 168 0.0410111 0.0419922 >> > 0.080433 0.0118995 >> > 3 1 240 239 0.0388959 0.034668 >> > 0.008052 0.0125385 >> > 4 1 356 355 0.0433309 0.0566406 >> > 0.00837 0.0112662 >> > 5 1 472 471 0.0459919 0.0566406 >> > 0.008343 0.0106034 >> > 6 1 550 549 0.0446735 0.0380859 >> > 0.036639 0.0108791 >> > 7 1 581 580 0.0404538 0.0151367 >> > 0.008614 0.0120654 >> > >> > >> > My test environment configuration: >> > Hardware servers with 1Gb network interfaces, 64Gb RAM and 16 CPU cores >> > per node, HDDs WDC WD5003ABYX-01WERA0. >> For anything production, consider faster network connections and SSD >> journals. >> >> > OpenStack with 1 controller, 1 compute and 2 ceph nodes (ceph on >> separate >> > nodes). >> > CentOS 6.5, kernel 2.6.32-431.el6.x86_64. >> > >> You will probably want a 3.14 or 3.16 kernel for various reasons. >> >> Regards, >> >> Christian >> >> > I tested several config options for optimizations, like in >> > /etc/ceph/ceph.conf: >> > >> > [default] >> > ... >> > osd_pool_default_pg_num = 1024 >> > osd_pool_default_pgp_num = 1024 >> > osd_pool_default_flag_hashpspool = true >> > ... >> > [osd] >> > osd recovery max active = 1 >> > osd max backfills = 1 >> > filestore max sync interval = 30 >> > filestore min sync interval = 29 >> > filestore flusher = false >> > filestore queue max ops = 10000 >> > filestore op threads = 16 >> > osd op threads = 16 >> > ... >> > [client] >> > rbd_cache = true >> > rbd_cache_writethrough_until_flush = true >> > >> > and in /etc/cinder/cinder.conf: >> > >> > [DEFAULT] >> > volume_tmp_dir=/tmp >> > >> > but in the result performance was increased only on ~30 % and it not >> > looks like huge success. >> > >> > Non-default mount options and TCP optimization increase the speed in >> > about 1%: >> > >> > [root@node-17 ~]# mount | grep ceph >> > /dev/sda4 on /var/lib/ceph/osd/ceph-0 type xfs >> > (rw,noexec,nodev,noatime,nodiratime,user_xattr,data=writeback,barrier=0) >> > >> > [root@node-17 ~]# cat /etc/sysctl.conf >> > net.core.rmem_max = 16777216 >> > net.core.wmem_max = 16777216 >> > net.ipv4.tcp_rmem = 4096 87380 16777216 >> > net.ipv4.tcp_wmem = 4096 65536 16777216 >> > net.ipv4.tcp_window_scaling = 1 >> > net.ipv4.tcp_timestamps = 1 >> > net.ipv4.tcp_sack = 1 >> > >> > >> > Do we have other ways to significantly improve CEPH storage performance? >> > Any feedback and comments are welcome! >> > >> > Thank you! >> > >> > >> >> >> -- >> Christian Balzer Network/Systems Engineer >> [email protected] Global OnLine Japan/Fusion Communications >> http://www.gol.com/ >> > > > > -- > > Timur, > QA Engineer > OpenStack Projects > Mirantis Inc > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
