On 20 December 2015 at 19:23, Francois Lafont <flafdiv...@free.fr> wrote:

> On 20/12/2015 22:51, Don Waterloo wrote:
>
> > All nodes have 10Gbps to each other
>
> Even the link client node <---> cluster nodes?
>
> > OSD:
> > $ ceph osd tree
> > ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY
> > -1 5.48996 root default
> > -2 0.89999     host nubo-1
> >  0 0.89999         osd.0         up  1.00000          1.00000
> > -3 0.89999     host nubo-2
> >  1 0.89999         osd.1         up  1.00000          1.00000
> > -4 0.89999     host nubo-3
> >  2 0.89999         osd.2         up  1.00000          1.00000
> > -5 0.92999     host nubo-19
> >  3 0.92999         osd.3         up  1.00000          1.00000
> > -6 0.92999     host nubo-20
> >  4 0.92999         osd.4         up  1.00000          1.00000
> > -7 0.92999     host nubo-21
> >  5 0.92999         osd.5         up  1.00000          1.00000
> >
> > Each contains 1 x Samsung 850 Pro 1TB SSD (on sata)
> >
> > Each are Ubuntu 15.10 running 4.3.0-040300-generic kernel.
> > Each are running ceph 0.94.5-0ubuntu0.15.10.1
> >
> > nubo-1/nubo-2/nubo-3 are 2x X5650 @ 2.67GHz w/ 96GB ram.
> > nubo-19/nubo-20/nubo-21 are 2x E5-2699 v3 @ 2.30GHz, w/ 576GB ram.
> >
> > the connections are to the chipset sata in each case.
> > The fio test to the underlying xfs disk
> > (e.g. cd /var/lib/ceph/osd/ceph-1; fio --randrepeat=1 --ioengine=libaio
> > --direct=1 --gtod_reduce=1 --name=readwrite --filename=rw.data --bs=4k
> > --iodepth=64 --size=5000MB --readwrite=randrw --rwmixread=50)
> > shows ~22K IOPS on each disk.
> >
> > nubo-1/2/3 are also the mon and the mds:
> > $ ceph status
> >     cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
> >      health HEALTH_OK
> >      monmap e1: 3 mons at {nubo-1=
> >
> 10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
> >             election epoch 1104, quorum 0,1,2 nubo-1,nubo-2,nubo-3
> >      mdsmap e621: 1/1/1 up {0=nubo-3=up:active}, 2 up:standby
> >      osdmap e2459: 6 osds: 6 up, 6 in
> >       pgmap v127331: 840 pgs, 6 pools, 144 GB data, 107 kobjects
> >             289 GB used, 5332 GB / 5622 GB avail
> >                  840 active+clean
> >   client io 0 B/s rd, 183 kB/s wr, 54 op/s
>
> And you have "replica size == 3" in your cluster, correct?
> Do you have specific mount options or specific options in ceph.conf
> concerning ceph-fuse?
>
> So the hardware configuration of your cluster seems to me globally highly
> better than my cluster (config given in my first message) because you have
> 10Gb links (between the client and the cluster I have just 1Gb) and you
> have full SSD OSDs.
>
> I have tried to put _all_ cephfs in my SSD: ie the pools "cephfsdata" _and_
> "cephfsmetadata" are in the SSD. The performances are slightly improved
> because
> I have ~670 iops now (with the fio command of my first message again) but
> it
> still seems to me bad.
>
> In fact, I'm curious to have the opinion of "cephfs" experts to know what
> iops we can expect. If anaything, ~700 iops is a correct iops for our
> hardware
> configuration and maybe we are searching a problem which doesn't exist...


All nodes are interconnected on 10G (actually 8x10G, so 80Gbps, but i have
7 disabled for this test). I have done a 'iperf' w/ TCP and verified I can
achieve ~9Gbps between each pair. I have jumbo frames enabled (so 9000 MTU,
8982 route mtu).

i have replica 2.

My 2 cephfs pools are:

pool 12 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 256 pgp_num 256 last_change 2239 flags
hashpspool stripe_width 0
pool 13 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 256 pgp_num 256 last_change 2243 flags
hashpspool crash_replay_interval 45 stripe_width 0

w/ cephfs-fuse, i used default except added noatime.

My ceph.conf is:

[global]
fsid = XXXX
mon_initial_members = nubo-2, nubo-3, nubo-1
mon_host = 10.100.10.61,10.100.10.62,10.100.10.60
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
public_network = 10.100.10.0/24
osd op threads = 6
osd disk threads = 6

[mon]
    mon clock drift allowed = .600
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to