[ceph-users] Very low performance with ceph kraken (11.2) with rados gw and erasure coded pool

fani rama Fri, 21 Apr 2017 07:54:35 -0700

Hi
I have a 7 node ceph cluster built with ceph kraken. HW details: each node
has 5 x 1TB drives and a single SSD which has been partitioned to provide
ceph journal for each of the 5 drives per node.
Network is 10GigE. Each node has 16 cpus (Intel Haswell family chipset)


I also setup  7 x radosgw's on each of the 7 nodes.

Now, if I attempt to upload a single 1.5GB test file via s3cmd to the ceph
cluser to an erasure coded pool -

pool 53 'default.rgw.buckets.data' erasure size 7 min_size 5 crush_ruleset
1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 808 flags
hashpspool stripe_width 4160

then the data xfer speed is very low. It averages about 100MB/s. I
understand that EC pools are lower in performance than replicated pools but
this seems extremely low.

If I create a new replicated pool and stop the radosgw processes and rename
the replicated pool as default.rgw.buckets.data and retransfer the file
(after clearing all system caches), the data transfer rate is much higher
at about 600MB/s. If I run it in parallel by uploading 6 different 1.5GB
files to 6 of the  radosgw's simultaneously then monitoring via ceph -w (or
ceph-dash tool I found online) I see total transfer speeds now around
2.5-3GB/s (about 400-500MB/s per radosgw).

Switching back to same test of uploading 6 different x 1.5GB to 6 radosgw's
and backend being erasure coded pool, it drops again to cumulative 150MB/s
transfer speed. (across all 6!)

Why is the performance under erasure coded pool for s3 so much lower?
iostat shows drives not even being maxed out (utils are around 20% tops).

Is there some tuning that needs to be done or something else that is
missing here? I've also experimented by changing the pg_num/pgp_num and set
it to 4096/8192 and still the performance with EC pool is much much lower
than with Replicated pool.


Also, there seems to be another bug - sometimes the size of some objects
shows up as 512k instead of 1.5GB. (s3cmd ls s3://bucket/testfile1).
However, doing a "s3cmd info or radosgw-admin object state --bucket=bucket
--object=testfile1 |grep obj_size" shows proper 1.5GB file size

Thanks,
Fani

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Very low performance with ceph kraken (11.2) with rados gw and erasure coded pool

Reply via email to