Unfortunately, even after removing all my kernel configuration , the
performance did not improve

Currently
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet net.ifnames=0 biosdevname=0
ipv6.disable=1 "

Before
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet net.ifnames=0 biosdevname=0
ipv6.disable=1 intel_pstate=disable intel_idle.max_cstate=0
processor.max_cstate=0 idle=poll numa=off"

This is extremely puzzling - any ideas, suggestions for troubleshooting it
will be GREATLY appreciated

Steven

On 2 February 2018 at 10:51, Steven Vacaroaia <ste...@gmail.com> wrote:

> Hi Mark,
>
> Thanks
> My pools are using replication =2
>
> I'll re enable numa and report back
>
> Steven
>
> On 2 February 2018 at 10:48, Marc Roos <m.r...@f1-outsourcing.eu> wrote:
>
>>
>> Not sure if this info is of any help, please beware I am also just in a
>> testing phase with ceph.
>>
>> I don’t know how numa=off is interpreted by the os. If it just hides
>> the numa, you still could run into the 'known issues'. That is why I
>> have numad running.
>> Furthermore I have put an osd 'out' that gives also a 0 in the reweight
>> column. So I guess your osd.1 is also not participating? If so, could
>> not be nice if your are testing 3x replication with 2 disks?
>>
>>
>> I have got this on SATA 5400rpm disks, replicated pool size 3.
>>
>> rados bench -p rbd 30 write --id rbd
>>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
>> lat(s)
>>    20      16       832       816   163.178       180    0.157838
>> 0.387074
>>    21      16       867       851   162.073       140    0.157289
>> 0.38817
>>    22      16       900       884   160.705       132    0.224024
>> 0.393674
>>    23      16       953       937   162.934       212    0.530274
>> 0.388189
>>    24      16       989       973   162.144       144    0.209806
>> 0.389644
>>    25      16      1028      1012   161.898       156    0.118438
>> 0.391057
>>    26      16      1067      1051    161.67       156    0.248463
>> 0.38977
>>    27      16      1112      1096   162.348       180    0.754184
>> 0.392159
>>    28      16      1143      1127   160.977       124    0.439342
>> 0.393641
>>    29      16      1185      1169   161.219       168   0.0801006
>> 0.393004
>>    30      16      1221      1205   160.644       144    0.224278
>> 0.39363
>> Total time run:         30.339270
>> Total writes made:      1222
>> Write size:             4194304
>> Object size:            4194304
>> Bandwidth (MB/sec):     161.111
>> Stddev Bandwidth:       24.6819
>> Max bandwidth (MB/sec): 212
>> Min bandwidth (MB/sec): 120
>> Average IOPS:           40
>> Stddev IOPS:            6
>> Max IOPS:               53
>> Min IOPS:               30
>> Average Latency(s):     0.396239
>> Stddev Latency(s):      0.249998
>> Max latency(s):         1.29482
>> Min latency(s):         0.06875
>>
>>
>> -----Original Message-----
>> From: Steven Vacaroaia [mailto:ste...@gmail.com]
>> Sent: vrijdag 2 februari 2018 15:25
>> To: ceph-users
>> Subject: [ceph-users] ceph luminous performance - disks at 100% , low
>> network utilization
>>
>> Hi,
>>
>> I have been struggling to get my test cluster to behave ( from a
>> performance perspective)
>> Dell R620, 64 GB RAM, 1 CPU, numa=off , PERC H710, Raid0, Enterprise 10K
>> disks
>>
>> No SSD - just plain HDD
>>
>> Local tests ( dd, hdparm ) confirm my disks are capable of delivering
>> 200 MBs
>> Fio  with 15 jobs indicate 100 MBs
>> Ceph tell shows  400MBs
>>
>> rados bench with 1 thread provide  3 MB
>> rados bench with 32 threads, 2 OSDs ( one per server) , barely touch 10
>> MB
>> Adding a third server / OSD improve performance slightly ( 11 MB)
>>
>> atop shows disk usage at 100% for extended period of time
>> Network usage is very low
>> Nothing else is "red"
>>
>> I have removed all TCP setting  and left ceph.conf mostly with defaults
>>
>> What am I missing ?
>>
>> Many thanks
>>
>> Steven
>>
>>
>> ceph osd tree
>> ID  CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF
>>
>>   0   hdd 0.54529         osd.0          up  1.00000 1.00000
>>  -5       0.54529     host osd02
>>   1   hdd 0.54529         osd.1          up        0 1.00000
>>  -7             0     host osd04
>> -17       0.54529     host osd05
>>   2   hdd 0.54529         osd.2          up  1.00000 1.00000
>>
>> [root@osd01 ~]# ceph tell osd.0 bench
>> {
>>     "bytes_written": 1073741824,
>>     "blocksize": 4194304,
>>     "bytes_per_sec": 452125657
>> }
>>
>> [root@osd01 ~]# ceph tell osd.2 bench
>> {
>>     "bytes_written": 1073741824,
>>     "blocksize": 4194304,
>>     "bytes_per_sec": 340553488
>> }
>>
>>
>> hdparm -tT /dev/sdc
>>
>> /dev/sdc:
>>  Timing cached reads:   5874 MB in  1.99 seconds = 2948.51 MB/sec
>>  Timing buffered disk reads: 596 MB in  3.01 seconds = 198.17 MB/sec
>>
>>  fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k
>> --numjobs=15 --iodepth=1 --runtime=60 --time_based --group_reporting
>> --name=journal-test
>> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
>> iodepth=1
>> ...
>> fio-2.2.8
>> Starting 15 processes
>> Jobs: 15 (f=15): [W(15)] [100.0% done] [0KB/104.9MB/0KB /s] [0/26.9K/0
>> iops] [eta 00m:00s]
>>
>>
>> fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k
>> --numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting
>> --name=journal-test
>> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
>> iodepth=1
>> ...
>> fio-2.2.8
>> Starting 5 processes
>> Jobs: 5 (f=5): [W(5)] [100.0% done] [0KB/83004KB/0KB /s] [0/20.8K/0
>> iops] [eta 00m:00s]
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to