I am not sure why, but I cannot get Jumbo Frames to work properly:
root@virt2:~# ping -M do -s 8972 -c 4 10.10.10.83
PING 10.10.10.83 (10.10.10.83) 8972(9000) bytes of data.
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
Jumbo Frames is on, on the switch and on the NIC's:
ens2f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.10.10.83 netmask 255.255.255.0 broadcast 10.10.10.255
inet6 fe80::ec4:7aff:feea:7b40 prefixlen 64 scopeid 0x20<link>
ether 0c:c4:7a:ea:7b:40 txqueuelen 1000 (Ethernet)
RX packets 166440655 bytes 229547410625 (213.7 GiB)
RX errors 0 dropped 223 overruns 0 frame 0
TX packets 142788790 bytes 188658602086 (175.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@virt2:~# ifconfig ens2f0
ens2f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.10.10.82 netmask 255.255.255.0 broadcast 10.10.10.255
inet6 fe80::ec4:7aff:feea:ff2c prefixlen 64 scopeid 0x20<link>
ether 0c:c4:7a:ea:ff:2c txqueuelen 1000 (Ethernet)
RX packets 466774 bytes 385578454 (367.7 MiB)
RX errors 4 dropped 223 overruns 0 frame 3
TX packets 594975 bytes 580053745 (553.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
On Mon, Nov 20, 2017 at 2:13 PM, Sébastien VIGNERON <
[email protected]> wrote:
> As a jumbo frame test, can you try the following?
>
> ping -M do -s 8972 -c 4 IP_of_other_node_within_cluster_network
>
> If you have « ping: sendto: Message too long », jumbo frames are not
> activated.
>
> Cordialement / Best regards,
>
> Sébastien VIGNERON
> CRIANN,
> Ingénieur / Engineer
> Technopôle du Madrillet
> 745, avenue de l'Université
> <https://maps.google.com/?q=745,+avenue+de+l'Universit%C3%A9%C2%A0+76800+Saint-Etienne+du+Rouvray+-+France&entry=gmail&source=g>
>
> 76800 Saint-Etienne du Rouvray - France
> <https://maps.google.com/?q=745,+avenue+de+l'Universit%C3%A9%C2%A0+76800+Saint-Etienne+du+Rouvray+-+France&entry=gmail&source=g>
>
> tél. +33 2 32 91 42 91 <+33%202%2032%2091%2042%2091>
> fax. +33 2 32 91 42 92 <+33%202%2032%2091%2042%2092>
> http://www.criann.fr
> mailto:[email protected] <[email protected]>
> support: [email protected]
>
> Le 20 nov. 2017 à 13:02, Rudi Ahlers <[email protected]> a écrit :
>
> We're planning on installing 12X Virtual Machines with some heavy loads.
>
> the SSD drives are INTEL SSDSC2BA400G4
>
> The SATA drives are ST8000NM0055-1RM112
>
> Please explain your comment, "b) will find a lot of people here who don't
> approve of it."
>
> I don't have access to the switches right now, but they're new so whatever
> default config ships from factory would be active. Though iperf shows 10.5
> GBytes / 9.02 Gbits/sec throughput.
>
> What speeds would you expect?
> "Though with your setup I would have expected something faster, but NOT
> the
> theoretical 600MB/s 4 HDDs will do in sequential writes."
>
>
>
> On this, "If an OSD has no fast WAL/DB, it will drag the overall speed
> down. Verify and if so fix this and re-test.": how?
>
>
> On Mon, Nov 20, 2017 at 1:44 PM, Christian Balzer <[email protected]> wrote:
>
>> On Mon, 20 Nov 2017 12:38:55 +0200 Rudi Ahlers wrote:
>>
>> > Hi,
>> >
>> > Can someone please help me, how do I improve performance on ou CEPH
>> cluster?
>> >
>> > The hardware in use are as follows:
>> > 3x SuperMicro servers with the following configuration
>> > 12Core Dual XEON 2.2Ghz
>> Faster cores is better for Ceph, IMNSHO.
>> Though with main storage on HDDs, this will do.
>>
>> > 128GB RAM
>> Overkill for Ceph but I see something else below...
>>
>> > 2x 400GB Intel DC SSD drives
>> Exact model please.
>>
>> > 4x 8TB Seagate 7200rpm 6Gbps SATA HDD's
>> One hopes that's a non SMR one.
>> Model please.
>>
>> > 1x SuperMicro DOM for Proxmox / Debian OS
>> Ah, Proxmox.
>> I'm personally not averse to converged, high density, multi-role clusters
>> myself, but you:
>> a) need to know what you're doing and
>> b) will find a lot of people here who don't approve of it.
>>
>> I've avoided DOMs so far (non-hotswapable SPOF), even though the SM ones
>> look good on paper with regards to endurance and IOPS.
>> The later being rather important for your monitors.
>>
>> > 4x Port 10Gbe NIC
>> > Cisco 10Gbe switch.
>> >
>> Configuration would be nice for those, LACP?
>>
>> >
>> > root@virt2:~# rados bench -p Data 10 write --no-cleanup
>> > hints = 1
>> > Maintaining 16 concurrent writes of 4194304 bytes to objects of size
>> > 4194304 for up to 10 seconds or 0 objects
>>
>> rados bench is limited tool and measuring bandwidth is in nearly all
>> the use cases pointless.
>> Latency is where it is at and testing from inside a VM is more relevant
>> than synthetic tests of the storage.
>> But it is a start.
>>
>> > Object prefix: benchmark_data_virt2_39099
>> > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
>> > lat(s)
>> > 0 0 0 0 0 0 -
>> > 0
>> > 1 16 85 69 275.979 276 0.185576
>> > 0.204146
>> > 2 16 171 155 309.966 344 0.0625409
>> > 0.193558
>> > 3 16 243 227 302.633 288 0.0547129
>> > 0.19835
>> > 4 16 330 314 313.965 348 0.0959492
>> > 0.199825
>> > 5 16 413 397 317.565 332 0.124908
>> > 0.196191
>> > 6 16 494 478 318.633 324 0.1556
>> > 0.197014
>> > 7 15 591 576 329.109 392 0.136305
>> > 0.192192
>> > 8 16 670 654 326.965 312 0.0703808
>> > 0.190643
>> > 9 16 757 741 329.297 348 0.165211
>> > 0.192183
>> > 10 16 828 812 324.764 284 0.0935803
>> > 0.194041
>> > Total time run: 10.120215
>> > Total writes made: 829
>> > Write size: 4194304
>> > Object size: 4194304
>> > Bandwidth (MB/sec): 327.661
>> What part of this surprises you?
>>
>> With a replication of 3, you have effectively the bandwidth of your 2 SSDs
>> (for small writes, not the case here) and the bandwidth of your 4 HDDs
>> available.
>> Given overhead, other inefficiencies and the fact that this is not a
>> sequential write from the HDD perspective, 320MB/s isn't all that bad.
>> Though with your setup I would have expected something faster, but NOT the
>> theoretical 600MB/s 4 HDDs will do in sequential writes.
>>
>> > Stddev Bandwidth: 35.8664
>> > Max bandwidth (MB/sec): 392
>> > Min bandwidth (MB/sec): 276
>> > Average IOPS: 81
>> > Stddev IOPS: 8
>> > Max IOPS: 98
>> > Min IOPS: 69
>> > Average Latency(s): 0.195191
>> > Stddev Latency(s): 0.0830062 <083%200062>
>> > Max latency(s): 0.481448
>> > Min latency(s): 0.0414858
>> > root@virt2:~# hdparm -I /dev/sda
>> >
>> >
>> >
>> > root@virt2:~# ceph osd tree
>> > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>> > -1 72.78290 root default
>> > -3 29.11316 host virt1
>> > 1 hdd 7.27829 osd.1 up 1.00000 1.00000
>> > 2 hdd 7.27829 osd.2 up 1.00000 1.00000
>> > 3 hdd 7.27829 osd.3 up 1.00000 1.00000
>> > 4 hdd 7.27829 osd.4 up 1.00000 1.00000
>> > -5 21.83487 host virt2
>> > 5 hdd 7.27829 osd.5 up 1.00000 1.00000
>> > 6 hdd 7.27829 osd.6 up 1.00000 1.00000
>> > 7 hdd 7.27829 osd.7 up 1.00000 1.00000
>> > -7 21.83487 host virt3
>> > 8 hdd 7.27829 osd.8 up 1.00000 1.00000
>> > 9 hdd 7.27829 osd.9 up 1.00000 1.00000
>> > 10 hdd 7.27829 osd.10 up 1.00000 1.00000
>> > 0 0 osd.0 down 0 1.00000
>> >
>> >
>> > root@virt2:~# ceph -s
>> > cluster:
>> > id: 278a2e9c-0578-428f-bd5b-3bb348923c27
>> > health: HEALTH_OK
>> >
>> > services:
>> > mon: 3 daemons, quorum virt1,virt2,virt3
>> > mgr: virt1(active)
>> > osd: 11 osds: 10 up, 10 in
>> >
>> > data:
>> > pools: 1 pools, 512 pgs
>> > objects: 6084 objects, 24105 MB
>> > usage: 92822 MB used, 74438 GB / 74529 GB avail
>> > pgs: 512 active+clean
>> >
>> > root@virt2:~# ceph -w
>> > cluster:
>> > id: 278a2e9c-0578-428f-bd5b-3bb348923c27
>> > health: HEALTH_OK
>> >
>> > services:
>> > mon: 3 daemons, quorum virt1,virt2,virt3
>> > mgr: virt1(active)
>> > osd: 11 osds: 10 up, 10 in
>> >
>> > data:
>> > pools: 1 pools, 512 pgs
>> > objects: 6084 objects, 24105 MB
>> > usage: 92822 MB used, 74438 GB / 74529 GB avail
>> > pgs: 512 active+clean
>> >
>> >
>> > 2017-11-20 12:32:08.199450 mon.virt1 [INF] mon.1 10.10.10.82:6789/0
>> >
>> >
>> >
>> > The SSD drives are used as journal drives:
>> >
>> Bluestore has no journals, don't confuse it and the people you're asking
>> for help.
>>
>> > root@virt3:~# ceph-disk list | grep /dev/sde | grep osd
>> > /dev/sdb1 ceph data, active, cluster ceph, osd.8, block /dev/sdb2,
>> > block.db /dev/sde1
>> > root@virt3:~# ceph-disk list | grep /dev/sdf | grep osd
>> > /dev/sdc1 ceph data, active, cluster ceph, osd.9, block /dev/sdc2,
>> > block.db /dev/sdf1
>> > /dev/sdd1 ceph data, active, cluster ceph, osd.10, block /dev/sdd2,
>> > block.db /dev/sdf2
>> >
>> >
>> >
>> > I see now /dev/sda doesn't have a journal, though it should have. Not
>> sure
>> > why.
>> If an OSD has no fast WAL/DB, it will drag the overall speed down.
>>
>> Verify and if so fix this and re-test.
>>
>> Christian
>>
>> > This is the command I used to create it:
>> >
>> >
>> > pveceph createosd /dev/sda -bluestore 1 -journal_dev /dev/sde
>> >
>> >
>>
>>
>> --
>> Christian Balzer Network/Systems Engineer
>> [email protected] Rakuten Communications
>>
>
>
>
> --
> Kind Regards
> Rudi Ahlers
> Website: http://www.rudiahlers.co.za
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
--
Kind Regards
Rudi Ahlers
Website: http://www.rudiahlers.co.za
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com