Hi Frederic,

 

If you are using EC pools, the primary OSD requests the remaining shards of
the object from the other OSD's, reassembles it and then sends the data to
the client. The entire object needs to be reconstructed even for a small IO
operation, so 4kb reads could lead to quite a large IO amplification if you
are using the default 4MB object sizes. I believe this is what you are
seeing, although creating a RBD with smaller object sizes can help reduce
this.

 

Nick

 

From: ceph-users [mailto:[email protected]] On Behalf Of
SCHAER Frederic
Sent: 23 April 2015 15:40
To: [email protected]
Subject: [ceph-users] read performance VS network usage

 

Hi again,

 

On my testbed, I have 5 ceph nodes, each containing 23 OSDs (2TB btrfs
drives). For these tests, I've setup a RAID0 on the 23 disks.

For now, I'm not using SSDs as I discovered my vendor apparently decreased
their perfs on purpose.

 

So : 5 server nodes of which 3 are MONS too.

I also have 5 clients.

All of them have a single 10G NIC,  I'm not using a private network.

I'm testing EC pools, with the failure domain set to hosts.

The EC pool k/m is set to k=4/m=1

I'm testing EC pools using the giant release
(ceph-0.87.1-0.el7.centos.x86_64)

 

And. I just found out I had "limited" read performance.

While I was watching the stats using dstat on one server node, I noticed
that during the rados (read) bench, all the server nodes sent about 370MiB/s
on the network, which is the average speed I get per server, but they also
all received about 750-800MiB/s on that same network. And 800MB/s is about
as much as you can get on a 10G link.

 

I'm trying to understand why I see this inbound data flow ?

-          Why does a server node receive data at all during a read bench ?

-          Why is it about twice as much as the data the node is sending ?

-          Is this about verifying data integrity at read time ?

 

I'm alone on the cluster, it's not used anywhere else.

I will try tomorrow to see if adding a 2nd 10G port (with a private network
this time) improves the performance, but I'm really curious here to
understand what's the bottleneck and what's ceph doing. ?

 

Looking at the write performance, I see the same kind of behavior : nodes
send about half the amount of data they receive (600MB/300MB), but this
might be because this time the client only sends the real data and the
erasure coding happens behind the scenes (or not ?)

 

Any idea ?

 

Regards

Frederic




_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to