Re: [ceph-users] CephFS performance.

Ronny Aasen Thu, 04 Oct 2018 02:10:41 -0700

On 10/4/18 7:04 AM, jes...@krogh.cc wrote:

Hi All.


First thanks for the good discussion and strong answer's I've gotten so far.

Current cluster setup is 4 x 10 x 12TB 7.2K RPM drives with all and
10GbitE and metadata on rotating drives - 3x replication - 256GB memory in
OSD hosts and 32+ cores. Behind Perc with eachdiskraid0 and BBWC.

Planned changes:
- is to get 1-2 more OSD-hosts
- experiment with EC-pools for CephFS
- MDS onto seperate host and metadata onto SSD's.

I'm still struggling to get "non-cached" performance up to "hardware"
speed - whatever that means. I do "fio" benchmark using 10GB files, 16
threads, 4M block size -- at which I can "almost" sustained fill the
10GbitE NIC. In this configuraiton I would have expected it to be "way
above" 10Gbit speed thus have the NIC not "almost" filled - but fully
filled - could that be the metadata activities .. but on "big files" and
read - that should not be much - right?

Above is actually ok for production, thus .. not a big issue, just
information.

Single threaded performance is still struggling

Cold HHD (read from disk in NFS-server end) / NFS performance:

jk@zebra01:~$ pipebench < /nfs/16GB.file > /dev/null
Summary:
Piped   15.86 GB in 00h00m27.53s:  589.88 MB/second


Local page cache (just to say it isn't the profiling tool delivering
limitations):
jk@zebra03:~$ pipebench < /nfs/16GB.file > /dev/null
Summary:
Piped   29.24 GB in 00h00m09.15s:    3.19 GB/second
jk@zebra03:~$

Now from the Ceph system:
jk@zebra01:~$ pipebench < /ceph/bigfile.file> /dev/null
Summary:
Piped   36.79 GB in 00h03m47.66s:  165.49 MB/second

Can block/stripe-size be tuned? Does it make sense?
Does read-ahead on the CephFS kernel-client need tuning?
What performance are other people seeing?
Other thoughts - recommendations?

On some of the shares we're storing pretty large files (GB size) and
need the backup to move them to tape - so it is preferred to be capable
of filling an LTO6 drive's write speed to capacity with a single thread.

40'ish 7.2K RPM drives - should - add up to more than above.. right?
This is the only current load being put on the cluster - + 100MB/s
recovery traffic.

the problem with single threaded performance in ceph. Is that it readsthe spindles in serial. so you are practically reading one and onedrive, and see a single disk's performance, subtracted all the overheadsfrom ceph, network, mds, etc.So you do not get the combined performance of the drives, only one driveat the time. So the trick for ceph performance is to get more spindlesworking for you at the same time.



There are ways to get more performance out of a single thread:
- faster components in the path, ie faster disk/network/cpu/memory

- larger pre-fetching/read-ahead, with a large enough read-ahead moreosd's will participate in reading simultaneously. [1] shows a table ofbenchmarks with different read-ahead sizes.- erasure coding. while erasure coding does add latency vs replicatedpools. You will get more spindles involved in reading in parallel. sofor large sequential loads erasure coding can have a benefit.- some sort of extra caching scheme, I have not looked at cachefiles,but it may provide some benefit.

you can also play with different cephfs implementations, there is a fuseclient, where you can play with different cache solutions. But generallythe kernel client is faster.

in rbd there is a fancy striping solution, by using --stripe-unit and--stripe-count. This would get more spindles running ; perhaps considerusing rbd instead of cephfs if it fits the workload.

[1]https://tracker.ceph.com/projects/ceph/wiki/Kernel_client_read_ahead_optimization


good luck
Ronny Aasen
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS performance.

Reply via email to