Unfortunately, even after removing all my kernel configuration , the performance did not improve
Currently GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet net.ifnames=0 biosdevname=0 ipv6.disable=1 " Before GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet net.ifnames=0 biosdevname=0 ipv6.disable=1 intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll numa=off" This is extremely puzzling - any ideas, suggestions for troubleshooting it will be GREATLY appreciated Steven On 2 February 2018 at 10:51, Steven Vacaroaia <ste...@gmail.com> wrote: > Hi Mark, > > Thanks > My pools are using replication =2 > > I'll re enable numa and report back > > Steven > > On 2 February 2018 at 10:48, Marc Roos <m.r...@f1-outsourcing.eu> wrote: > >> >> Not sure if this info is of any help, please beware I am also just in a >> testing phase with ceph. >> >> I don’t know how numa=off is interpreted by the os. If it just hides >> the numa, you still could run into the 'known issues'. That is why I >> have numad running. >> Furthermore I have put an osd 'out' that gives also a 0 in the reweight >> column. So I guess your osd.1 is also not participating? If so, could >> not be nice if your are testing 3x replication with 2 disks? >> >> >> I have got this on SATA 5400rpm disks, replicated pool size 3. >> >> rados bench -p rbd 30 write --id rbd >> sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg >> lat(s) >> 20 16 832 816 163.178 180 0.157838 >> 0.387074 >> 21 16 867 851 162.073 140 0.157289 >> 0.38817 >> 22 16 900 884 160.705 132 0.224024 >> 0.393674 >> 23 16 953 937 162.934 212 0.530274 >> 0.388189 >> 24 16 989 973 162.144 144 0.209806 >> 0.389644 >> 25 16 1028 1012 161.898 156 0.118438 >> 0.391057 >> 26 16 1067 1051 161.67 156 0.248463 >> 0.38977 >> 27 16 1112 1096 162.348 180 0.754184 >> 0.392159 >> 28 16 1143 1127 160.977 124 0.439342 >> 0.393641 >> 29 16 1185 1169 161.219 168 0.0801006 >> 0.393004 >> 30 16 1221 1205 160.644 144 0.224278 >> 0.39363 >> Total time run: 30.339270 >> Total writes made: 1222 >> Write size: 4194304 >> Object size: 4194304 >> Bandwidth (MB/sec): 161.111 >> Stddev Bandwidth: 24.6819 >> Max bandwidth (MB/sec): 212 >> Min bandwidth (MB/sec): 120 >> Average IOPS: 40 >> Stddev IOPS: 6 >> Max IOPS: 53 >> Min IOPS: 30 >> Average Latency(s): 0.396239 >> Stddev Latency(s): 0.249998 >> Max latency(s): 1.29482 >> Min latency(s): 0.06875 >> >> >> -----Original Message----- >> From: Steven Vacaroaia [mailto:ste...@gmail.com] >> Sent: vrijdag 2 februari 2018 15:25 >> To: ceph-users >> Subject: [ceph-users] ceph luminous performance - disks at 100% , low >> network utilization >> >> Hi, >> >> I have been struggling to get my test cluster to behave ( from a >> performance perspective) >> Dell R620, 64 GB RAM, 1 CPU, numa=off , PERC H710, Raid0, Enterprise 10K >> disks >> >> No SSD - just plain HDD >> >> Local tests ( dd, hdparm ) confirm my disks are capable of delivering >> 200 MBs >> Fio with 15 jobs indicate 100 MBs >> Ceph tell shows 400MBs >> >> rados bench with 1 thread provide 3 MB >> rados bench with 32 threads, 2 OSDs ( one per server) , barely touch 10 >> MB >> Adding a third server / OSD improve performance slightly ( 11 MB) >> >> atop shows disk usage at 100% for extended period of time >> Network usage is very low >> Nothing else is "red" >> >> I have removed all TCP setting and left ceph.conf mostly with defaults >> >> What am I missing ? >> >> Many thanks >> >> Steven >> >> >> ceph osd tree >> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >> >> 0 hdd 0.54529 osd.0 up 1.00000 1.00000 >> -5 0.54529 host osd02 >> 1 hdd 0.54529 osd.1 up 0 1.00000 >> -7 0 host osd04 >> -17 0.54529 host osd05 >> 2 hdd 0.54529 osd.2 up 1.00000 1.00000 >> >> [root@osd01 ~]# ceph tell osd.0 bench >> { >> "bytes_written": 1073741824, >> "blocksize": 4194304, >> "bytes_per_sec": 452125657 >> } >> >> [root@osd01 ~]# ceph tell osd.2 bench >> { >> "bytes_written": 1073741824, >> "blocksize": 4194304, >> "bytes_per_sec": 340553488 >> } >> >> >> hdparm -tT /dev/sdc >> >> /dev/sdc: >> Timing cached reads: 5874 MB in 1.99 seconds = 2948.51 MB/sec >> Timing buffered disk reads: 596 MB in 3.01 seconds = 198.17 MB/sec >> >> fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k >> --numjobs=15 --iodepth=1 --runtime=60 --time_based --group_reporting >> --name=journal-test >> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, >> iodepth=1 >> ... >> fio-2.2.8 >> Starting 15 processes >> Jobs: 15 (f=15): [W(15)] [100.0% done] [0KB/104.9MB/0KB /s] [0/26.9K/0 >> iops] [eta 00m:00s] >> >> >> fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k >> --numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting >> --name=journal-test >> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, >> iodepth=1 >> ... >> fio-2.2.8 >> Starting 5 processes >> Jobs: 5 (f=5): [W(5)] [100.0% done] [0KB/83004KB/0KB /s] [0/20.8K/0 >> iops] [eta 00m:00s] >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com