Hi German, We have similar config:
proxmox-ve: 5.1-27 (running kernel: 4.13.8-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.8-1-pve: 4.13.8-27
ceph: 12.2.1-pve3
system(4 nodes): Supermicro 2028U-TN24R4T+
2 port Mellanox connect x3pro 56Gbit
4 port intel 10GigE
memory: 768 GBytes
CPU DUAL Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
ceph: 28 osds
24 Intel Nvme 2000GB Intel SSD DC P3520, 2,5", PCIe 3.0 x4,
4 Intel Nvme 1,6TB Intel SSD DC P3700, 2,5", U.2 PCIe 3.0
Sysbench on container:
#!/bin/bash
sysbench --test=fileio --file-total-size=4G --file-num=64 prepare
for run in 1 2 3 ;do
for thread in 1 4 8 16 32 ;do
echo "Performing test RW-${thread}T-${run}"
sysbench --test=fileio --file-total-size=4G --file-test-mode=rndwr
--max-time=60 --max-requests=0 --file-block-size=4K --file-num=64
--num-threads=${thread} run > /root/RW-${thread}T-${run}
echo "Performing test RR-${thread}T-${run}"
sysbench --test=fileio --file-total-size=4G --file-test-mode=rndrd
--max-time=60 --max-requests=0 --file-block-size=4K --file-num=64
--num-threads=${thread} run > /root/RR-${thread}T-${run}
echo "Performing test SQ-${thread}T-${run}"
sysbench --test=oltp --db-driver=mysql --oltp-table-size=40000000
--mysql-db=sysbench --mysql-user=sysbench --mysql-password=password
--max-time=60 --max-requests=0 --num-threads=${thread} run >
/root/SQ-${thread}T-${run}
done
done
grep transactions: S*
SQ-1T-1: transactions: 6009 (100.14 per sec.)
SQ-1T-2: transactions: 9458 (157.62 per sec.)
SQ-1T-3: transactions: 9479 (157.97 per sec.)
SQ-4T-1: transactions: 26574 (442.84 per sec.)
SQ-4T-2: transactions: 28275 (471.20 per sec.)
SQ-4T-3: transactions: 28067 (467.69 per sec.)
SQ-8T-1: transactions: 44450 (740.78 per sec.)
SQ-8T-2: transactions: 44410 (740.09 per sec.)
SQ-8T-3: transactions: 44459 (740.93 per sec.)
SQ-16T-1: transactions: 59866 (997.59 per sec.)
SQ-16T-2: transactions: 59539 (991.99 per sec.)
SQ-16T-3: transactions: 59615 (993.50 per sec.)
SQ-32T-1: transactions: 71070 (1184.18 per sec.)
SQ-32T-2: transactions: 71007 (1183.14 per sec.)
SQ-32T-3: transactions: 71320 (1188.51 per sec.)
grep Requests/sec R*
RR-16T-1:1464550.51 Requests/sec executed
RR-16T-2:1473440.63 Requests/sec executed
RR-16T-3:1515853.86 Requests/sec executed
RR-1T-1:741333.28 Requests/sec executed
RR-1T-2:693246.00 Requests/sec executed
RR-1T-3:691166.38 Requests/sec executed
RR-32T-1:1432609.74 Requests/sec executed
RR-32T-2:1479191.78 Requests/sec executed
RR-32T-3:1476780.11 Requests/sec executed
RR-4T-1:1411168.95 Requests/sec executed
RR-4T-2:1373557.99 Requests/sec executed
RR-4T-3:1306820.18 Requests/sec executed
RR-8T-1:1549924.57 Requests/sec executed
RR-8T-2:1580304.14 Requests/sec executed
RR-8T-3:1603842.56 Requests/sec executed
RW-16T-1:12753.82 Requests/sec executed
RW-16T-2:12394.93 Requests/sec executed
RW-16T-3:12560.11 Requests/sec executed
RW-1T-1: 1344.99 Requests/sec executed
RW-1T-2: 1324.98 Requests/sec executed
RW-1T-3: 1306.64 Requests/sec executed
RW-32T-1:16565.37 Requests/sec executed
RW-32T-2:16497.67 Requests/sec executed
RW-32T-3:16542.54 Requests/sec executed
RW-4T-1: 5099.07 Requests/sec executed
RW-4T-2: 4970.28 Requests/sec executed
RW-4T-3: 5121.44 Requests/sec executed
RW-8T-1: 8487.91 Requests/sec executed
RW-8T-2: 8632.96 Requests/sec executed
RW-8T-3: 8393.91 Requests/sec executed
Gerhard W. Recher
net4sec UG (haftungsbeschränkt)
Leitenweg 6
86929 Penzing
+49 171 4802507
Am 27.11.2017 um 14:02 schrieb German Anders:
> Hi All,
>
> I've a performance question, we recently install a brand new Ceph
> cluster with all-nvme disks, using ceph version 12.2.0 with bluestore
> configured. The back-end of the cluster is using a bond IPoIB
> (active/passive) , and for the front-end we are using a bonding config
> with active/active (20GbE) to communicate with the clients.
>
> The cluster configuration is the following:
>
> *MON Nodes:*
> OS: Ubuntu 16.04.3 LTS | kernel 4.12.14
> 3x 1U servers:
> 2x Intel Xeon E5-2630v4 @2.2Ghz
> 128G RAM
> 2x Intel SSD DC S3520 150G (in RAID-1 for OS)
> 2x 82599ES 10-Gigabit SFI/SFP+ Network Connection
>
> *OSD Nodes:*
> OS: Ubuntu 16.04.3 LTS | kernel 4.12.14
> 4x 2U servers:
> 2x Intel Xeon E5-2640v4 @2.4Ghz
> 128G RAM
> 2x Intel SSD DC S3520 150G (in RAID-1 for OS)
> 1x Ethernet Controller 10G X550T
> 1x 82599ES 10-Gigabit SFI/SFP+ Network Connection
> 12x Intel SSD DC P3520 1.2T (NVMe) for OSD daemons
> 1x Mellanox ConnectX-3 InfiniBand FDR 56Gb/s Adapter (dual port)
>
>
> Here's the tree:
>
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -7 48.00000 root root
> -5 24.00000 rack rack1
> -1 12.00000 node cpn01
> 0 nvme 1.00000 osd.0 up 1.00000 1.00000
> 1 nvme 1.00000 osd.1 up 1.00000 1.00000
> 2 nvme 1.00000 osd.2 up 1.00000 1.00000
> 3 nvme 1.00000 osd.3 up 1.00000 1.00000
> 4 nvme 1.00000 osd.4 up 1.00000 1.00000
> 5 nvme 1.00000 osd.5 up 1.00000 1.00000
> 6 nvme 1.00000 osd.6 up 1.00000 1.00000
> 7 nvme 1.00000 osd.7 up 1.00000 1.00000
> 8 nvme 1.00000 osd.8 up 1.00000 1.00000
> 9 nvme 1.00000 osd.9 up 1.00000 1.00000
> 10 nvme 1.00000 osd.10 up 1.00000 1.00000
> 11 nvme 1.00000 osd.11 up 1.00000 1.00000
> -3 12.00000 node cpn03
> 24 nvme 1.00000 osd.24 up 1.00000 1.00000
> 25 nvme 1.00000 osd.25 up 1.00000 1.00000
> 26 nvme 1.00000 osd.26 up 1.00000 1.00000
> 27 nvme 1.00000 osd.27 up 1.00000 1.00000
> 28 nvme 1.00000 osd.28 up 1.00000 1.00000
> 29 nvme 1.00000 osd.29 up 1.00000 1.00000
> 30 nvme 1.00000 osd.30 up 1.00000 1.00000
> 31 nvme 1.00000 osd.31 up 1.00000 1.00000
> 32 nvme 1.00000 osd.32 up 1.00000 1.00000
> 33 nvme 1.00000 osd.33 up 1.00000 1.00000
> 34 nvme 1.00000 osd.34 up 1.00000 1.00000
> 35 nvme 1.00000 osd.35 up 1.00000 1.00000
> -6 24.00000 rack rack2
> -2 12.00000 node cpn02
> 12 nvme 1.00000 osd.12 up 1.00000 1.00000
> 13 nvme 1.00000 osd.13 up 1.00000 1.00000
> 14 nvme 1.00000 osd.14 up 1.00000 1.00000
> 15 nvme 1.00000 osd.15 up 1.00000 1.00000
> 16 nvme 1.00000 osd.16 up 1.00000 1.00000
> 17 nvme 1.00000 osd.17 up 1.00000 1.00000
> 18 nvme 1.00000 osd.18 up 1.00000 1.00000
> 19 nvme 1.00000 osd.19 up 1.00000 1.00000
> 20 nvme 1.00000 osd.20 up 1.00000 1.00000
> 21 nvme 1.00000 osd.21 up 1.00000 1.00000
> 22 nvme 1.00000 osd.22 up 1.00000 1.00000
> 23 nvme 1.00000 osd.23 up 1.00000 1.00000
> -4 12.00000 node cpn04
> 36 nvme 1.00000 osd.36 up 1.00000 1.00000
> 37 nvme 1.00000 osd.37 up 1.00000 1.00000
> 38 nvme 1.00000 osd.38 up 1.00000 1.00000
> 39 nvme 1.00000 osd.39 up 1.00000 1.00000
> 40 nvme 1.00000 osd.40 up 1.00000 1.00000
> 41 nvme 1.00000 osd.41 up 1.00000 1.00000
> 42 nvme 1.00000 osd.42 up 1.00000 1.00000
> 43 nvme 1.00000 osd.43 up 1.00000 1.00000
> 44 nvme 1.00000 osd.44 up 1.00000 1.00000
> 45 nvme 1.00000 osd.45 up 1.00000 1.00000
> 46 nvme 1.00000 osd.46 up 1.00000 1.00000
> 47 nvme 1.00000 osd.47 up 1.00000 1.00000
>
> The disk partition of one of the OSD nodes:
>
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> nvme6n1 259:1 0 1.1T 0 disk
> ├─nvme6n1p2 259:15 0 1.1T 0 part
> └─nvme6n1p1 259:13 0 100M 0 part /var/lib/ceph/osd/ceph-6
> nvme9n1 259:0 0 1.1T 0 disk
> ├─nvme9n1p2 259:8 0 1.1T 0 part
> └─nvme9n1p1 259:7 0 100M 0 part /var/lib/ceph/osd/ceph-9
> sdb 8:16 0 139.8G 0 disk
> └─sdb1 8:17 0 139.8G 0 part
> └─md0 9:0 0 139.6G 0 raid1
> ├─md0p2 259:31 0 1K 0 md
> ├─md0p5 259:32 0 139.1G 0 md
> │ ├─cpn01--vg-swap 253:1 0 27.4G 0 lvm [SWAP]
> │ └─cpn01--vg-root 253:0 0 111.8G 0 lvm /
> └─md0p1 259:30 0 486.3M 0 md /boot
> nvme11n1 259:2 0 1.1T 0 disk
> ├─nvme11n1p1 259:12 0 100M 0 part
> /var/lib/ceph/osd/ceph-11
> └─nvme11n1p2 259:14 0 1.1T 0 part
> nvme2n1 259:6 0 1.1T 0 disk
> ├─nvme2n1p2 259:21 0 1.1T 0 part
> └─nvme2n1p1 259:20 0 100M 0 part /var/lib/ceph/osd/ceph-2
> nvme5n1 259:3 0 1.1T 0 disk
> ├─nvme5n1p1 259:9 0 100M 0 part /var/lib/ceph/osd/ceph-5
> └─nvme5n1p2 259:10 0 1.1T 0 part
> nvme8n1 259:24 0 1.1T 0 disk
> ├─nvme8n1p1 259:26 0 100M 0 part /var/lib/ceph/osd/ceph-8
> └─nvme8n1p2 259:28 0 1.1T 0 part
> nvme10n1 259:11 0 1.1T 0 disk
> ├─nvme10n1p1 259:22 0 100M 0 part
> /var/lib/ceph/osd/ceph-10
> └─nvme10n1p2 259:23 0 1.1T 0 part
> nvme1n1 259:33 0 1.1T 0 disk
> ├─nvme1n1p1 259:34 0 100M 0 part /var/lib/ceph/osd/ceph-1
> └─nvme1n1p2 259:35 0 1.1T 0 part
> nvme4n1 259:5 0 1.1T 0 disk
> ├─nvme4n1p1 259:18 0 100M 0 part /var/lib/ceph/osd/ceph-4
> └─nvme4n1p2 259:19 0 1.1T 0 part
> nvme7n1 259:25 0 1.1T 0 disk
> ├─nvme7n1p1 259:27 0 100M 0 part /var/lib/ceph/osd/ceph-7
> └─nvme7n1p2 259:29 0 1.1T 0 part
> sda 8:0 0 139.8G 0 disk
> └─sda1 8:1 0 139.8G 0 part
> └─md0 9:0 0 139.6G 0 raid1
> ├─md0p2 259:31 0 1K 0 md
> ├─md0p5 259:32 0 139.1G 0 md
> │ ├─cpn01--vg-swap 253:1 0 27.4G 0 lvm [SWAP]
> │ └─cpn01--vg-root 253:0 0 111.8G 0 lvm /
> └─md0p1 259:30 0 486.3M 0 md /boot
> nvme0n1 259:36 0 1.1T 0 disk
> ├─nvme0n1p1 259:37 0 100M 0 part /var/lib/ceph/osd/ceph-0
> └─nvme0n1p2 259:38 0 1.1T 0 part
> nvme3n1 259:4 0 1.1T 0 disk
> ├─nvme3n1p1 259:16 0 100M 0 part /var/lib/ceph/osd/ceph-3
> └─nvme3n1p2 259:17 0 1.1T 0 part
>
>
> For the disk scheduler we're using [kyber], for the read_ahead_kb we
> try different values (0,128 and 2048), the rq_affinity set to 2, and
> the rotational parameter set to 0.
> We've also set the CPU governor to performance on all the cores, and
> tune some sysctl parameters also:
>
> # for Ceph
> net.ipv4.ip_forward=0
> net.ipv4.conf.default.rp_filter=1
> kernel.sysrq=0
> kernel.core_uses_pid=1
> net.ipv4.tcp_syncookies=0
> #net.netfilter.nf_conntrack_max=2621440
> #net.netfilter.nf_conntrack_tcp_timeout_established = 1800
> # disable netfilter on bridges
> #net.bridge.bridge-nf-call-ip6tables = 0
> #net.bridge.bridge-nf-call-iptables = 0
> #net.bridge.bridge-nf-call-arptables = 0
> vm.min_free_kbytes=1000000
>
> # Controls the maximum size of a message, in bytes
> kernel.msgmnb = 65536
>
> # Controls the default maxmimum size of a mesage queue
> kernel.msgmax = 65536
>
> # Controls the maximum shared segment size, in bytes
> kernel.shmmax = 68719476736
>
> # Controls the maximum number of shared memory segments, in pages
> kernel.shmall = 4294967296
>
>
> The ceph.conf file is:
>
> ...
> osd_pool_default_size = 3
> osd_pool_default_min_size = 2
> osd_pool_default_pg_num = 1600
> osd_pool_default_pgp_num = 1600
>
> debug_crush = 1/1
> debug_buffer = 0/1
> debug_timer = 0/0
> debug_filer = 0/1
> debug_objecter = 0/1
> debug_rados = 0/5
> debug_rbd = 0/5
> debug_ms = 0/5
> debug_throttle = 1/1
>
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_journal = 0/0
> debug_filestore = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
>
> osd_crush_chooseleaf_type = 0
> filestore_xattr_use_omap = true
>
> rbd_cache = true
> mon_compact_on_trim = false
>
> [osd]
> osd_crush_update_on_start = false
>
> [client]
> rbd_cache = true
> rbd_cache_writethrough_until_flush = true
> rbd_default_features = 1
> admin_socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
> log_file = /var/log/ceph/
>
>
> The cluster has two production pools on for openstack (volumes) with
> RF of 3 and another pool for db (db) with RF of 2. The DBA team has
> perform several tests with a volume mounted on the DB server (with
> RBD). The DB server has the following configuration:
>
> OS: CentOS 6.9 | kernel 4.14.1
> DB: MySQL
> ProLiant BL685c G7
> 4x AMD Opteron Processor 6376 (total of 64 cores)
> 128G RAM
> 1x OneConnect 10Gb NIC (quad-port) - in a bond configuration
> (active/active) with 3 vlans
>
>
>
> We also did some tests with *sysbench* on different storage types:
>
> sysbench
> disk tps qps latency (ms) 95th percentile
> Local SSD 261,28 5.225,61 5,18
> Ceph NVMe 95,18 1.903,53 12,3
> Pure Storage 196,49 3.929,71 6,32
> NetApp FAS 189,83 3.796,59 6,67
> EMC VMAX 196,14 3.922,82 6,32
>
>
>
> Is there any specific tuning that I can apply to the ceph cluster, in
> order to improve those numbers? Or are those numbers ok for the type
> and size of the cluster that we have? Any advice would be really
> appreciated.
>
> Thanks,
>
>
> *
> *
> *German
> *
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
