Re: [ceph-users] Performance tuning for SAN SSD config

Maged Mokhtar Fri, 06 Jul 2018 16:33:46 -0700

Hi 

        * What is your cpu utilization like ? are any cores close to
saturation ?
        * If you use fio to test a raw FC lun (ie prior to adding as OSD) from
your host using random 4k blocks and high queue depth (32 or more) , do
you get high iops ? what is the disk utilization ? cpu utilization ?
        * If you repeat the above test but instead of testing 1 lun, run
concurrent fio test on all 5 luns on the host, does the aggregate iops
performance scale x5 ? any resource issues ?
        * Does increasing /sys/block/sdX/queue/nr_requests help ?
        * Can you use active/active multipath ?
        * If the above gives good performance/resource utilization, would you
get better performance if you had more that 20 OSDs/luns in total, for
example 40 or 60 ? that should not cost you anything.
        * I still think you can use replica of 1 in Ceph since your SAN
already has redundancy. It maybe an over-kill to use both. I am not
trying to save space on the SAN but rather reduce write latency on the
Ceph side.


Maged 

On 2018-07-06 20:19, Matthew Stroud wrote:

> Good to note about the replica set, we will stick with 3. We really aren't 
> concerned about the overhead, but the additional IO that occurs during writes 
> that have an additional copy.
> 
> To be clear, we aren't using ceph in place of FC, nor the other way around. 
> We have discovered that SAN storage is cheaper (this one was surprising to 
> me) and better performant than direct attached storage (DAS) on the small 
> scale that we are building things (20T to about 100T). I'm sure that would 
> switch if we were much larger, but for now SAN is better. In summary we are 
> using SAN pretty much as a DAS and ceph uses those SAN disks for OSDs.
> 
> The biggest issue we see is slow requests during rebuilds or node/osd 
> failures but the disks and network just aren't being to their fullest. That 
> would lead me to believe that there are some host and/or osd process 
> bottlenecks going on. Other than that, just increasing the performance of our 
> ceph cluster would be a plus and that is what I'm exploring.
> 
> As per test numbers, I can't run that right now because the systems we have 
> are in prod and I don't want to impact that for io testing. However, we do 
> have a new cluster coming online shortly and I could do some benchmarking 
> there and get that back to you.
> 
> However as memory serves, we were only getting something about 90-100k iops 
> and about 15 - 50 ms latency with 10 servers running fio with 50% of random 
> and sequential workloads. With a single vm, we were getting about 14k iops 
> with about 10 - 30 ms of latency.
> 
> Thanks,
> Matthew Stroud
> 
> On 7/6/18, 11:12 AM, "Vasu Kulkarni" <vakul...@redhat.com> wrote:
> 
> On Fri, Jul 6, 2018 at 8:38 AM, Matthew Stroud <mattstr...@overstock.com> 
> wrote:
>>
>> Thanks for the reply.
>>
>>
>>
>> Actually we are using fiber channel (it's so much more performant than iscsi 
>> in our tests) as the primary storage and this is serving up traffic for RBD 
>> for openstack, so this isn't for backups.
>>
>>
>>
>> Our biggest bottle neck is trying utilize the host and/or osd process 
>> correctly. The disks are running at sub-millisecond, with about 90% of the 
>> IO being pulled from the array's cache (a.k.a. not even hitting the disks). 
>> According to the host, we never get north of 20% disk utilization, unless 
>> there is a deep scrub going on.
>>
>>
>>
>> We have debated about putting the replica size to 2 instead of 3. However 
>> this isn't much of a win for the purestorage which dedupes on the backend, 
>> so having copies of data are relatively free for that unit. 1 wouldn't work 
>> because this is hosting a production work load.
> 
> It is a mistake to use replica of 2 for production, when one of the
> copy is corrupted its hard to fix things. if you are concerned about
> storage overhead there is an option to use EC pools in luminous.  To
> get back to your original question if you are comparing the
> network/disk utilization with FC numbers than that is wrong
> comparison,  They are 2 different storage systems with different
> purposes, Ceph is scale out object storage system unlike FC systems
> where you can use commodity hardware and grow as you need, you
> generally dont need hba/fc enclosed disks but nothing stopping you
> from using your existing system. Also you generally dont need any raid
> mirroring configurations in the backend since ceph will handle the
> redundancy for you. scale out systems have more work to do than
> traditional FC systems. There are minimal configuration options for
> bluestore , what kind of disk/network utilization slowdown you are
> seeing? can you publish your numbers and test data?
>>
>> Thanks,
>>
>> Matthew Stroud
>>
>>
>> From: Maged Mokhtar <mmokh...@petasan.org>
>> Date: Friday, July 6, 2018 at 7:01 AM
>> To: Matthew Stroud <mattstr...@overstock.com>
>> Cc: ceph-users <ceph-users@lists.ceph.com>
>> Subject: Re: [ceph-users] Performance tuning for SAN SSD config
>>
>>
>> On 2018-06-29 18:30, Matthew Stroud wrote:
>>
>> We back some of our ceph clusters with SAN SSD disk, particularly VSP G/F 
>> and Purestorage. I'm curious what are some settings we should look into 
>> modifying to take advantage of our SAN arrays. We had to manually set the 
>> class for the luns to SSD class which was a big improvement. However we 
>> still see situations where we get slow requests and the underlying disks and 
>> network are underutilized.
>>
>>
>> More info about our setup. We are running centos 7 with Luminous as our ceph 
>> release. We have 4 osd nodes that have 5x2TB disks each and they are setup 
>> as bluestore. Our ceph.conf is attached with some information removed for 
>> security reasons.
>>
>>
>> Thanks ahead of time.
>>
>> Thanks,
>> Matthew Stroud
>>
>> ________________________________
>>
>>
>> CONFIDENTIALITY NOTICE: This message is intended only for the use and review 
>> of the individual or entity to which it is addressed and may contain 
>> information that is privileged and confidential. If the reader of this 
>> message is not the intended recipient, or the employee or agent responsible 
>> for delivering the message solely to the intended recipient, you are hereby 
>> notified that any dissemination, distribution or copying of this 
>> communication is strictly prohibited. If you have received this 
>> communication in error, please notify sender immediately by telephone or 
>> return email. Thank you.
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> If i understand correctly, you are using luns (via iSCSI) from your external 
>> SAN as OSDs and created a separate pool with these OSDs with device class 
>> SSD, you are using this pool for backup.
>>
>> Some comments:
>>
>> Using external disks as OSDs is probably not that common. It may be better 
>> to keep the SAN and Ceph cluster separate and have your backup tool access 
>> both, it will also be safer in case of disaster to the cluster your backup 
>> will be on a separate system.
>> What backup tool/script are you using ? it is better that this tool uses 
>> high queue depth, large block sizes and memory/page cache to increase 
>> performance during copies.
>> To try to pin down where your current bottleneck is, i would run benchmarks 
>> (eg fio) using the block sizes used by your backup tool on the raw luns 
>> before being added as OSDs (as pure iSCSI disks) as well as on both the main 
>> and backup pools. Have a resource tool (eg atop/systat/collectl) run during 
>> these tests to check for resources: disks %busy/cores %busy/io_wait
>> You probably can use replica count of 1 for the SAN OSDs since they include 
>> their own RAID redundancy.
>>
>> Maged
>>
>>
>> ________________________________
>>
>> CONFIDENTIALITY NOTICE: This message is intended only for the use and review 
>> of the individual or entity to which it is addressed and may contain 
>> information that is privileged and confidential. If the reader of this 
>> message is not the intended recipient, or the employee or agent responsible 
>> for delivering the message solely to the intended recipient, you are hereby 
>> notified that any dissemination, distribution or copying of this 
>> communication is strictly prohibited. If you have received this 
>> communication in error, please notify sender immediately by telephone or 
>> return email. Thank you.
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> ________________________________
> 
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review 
> of the individual or entity to which it is addressed and may contain 
> information that is privileged and confidential. If the reader of this 
> message is not the intended recipient, or the employee or agent responsible 
> for delivering the message solely to the intended recipient, you are hereby 
> notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this communication 
> in error, please notify sender immediately by telephone or return email. 
> Thank you.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance tuning for SAN SSD config

Reply via email to