[ceph-users] Re: The feasibility of mixed SSD and HDD replicated pool

胡玮文 Mon, 26 Oct 2020 21:44:03 -0700

Thank you for sharing your experience. Glad to hear that someone has already 
used this strategy and it works well.


> 在 2020年10月27日，03:10，Reed Dier <[email protected]> 写道：
> 
> Late reply, but I have been using what I refer to as a "hybrid" crush 
> topology for some data for a while now.
> 
> Initially with just rados objects, and later with RBD.
> 
> We found that we were able to accelerate reads to roughly all-ssd performance 
> levels, while bringing up the tail end of the write performance a bit.
> Write performance wasn't orders of magnitude improvements, but the ssd write 
> + replicate to hdd cycle seemed to be an improvement in reducing slow ops, 
> etc.
> 
> I will see if I can follow up with some rough benchmarks I can dig up.
> 
> As for implementation, I have SSD-only hosts, and HDD-only hosts, bifurcated 
> at the root level of crush.
> 
>>    {
>>        "rule_id": 2,
>>        "rule_name": "hybrid_ruleset",
>>        "ruleset": 2,
>>        "type": 1,
>>        "min_size": 1,
>>        "max_size": 10,
>>        "steps": [
>>            {
>>                "op": "take",
>>                "item": -13,
>>                "item_name": "ssd"
>>            },
>>            {
>>                "op": "chooseleaf_firstn",
>>                "num": 1,
>>                "type": "host"
>>            },
>>            {
>>                "op": "emit"
>>            },
>>            {
>>                "op": "take",
>>                "item": -1,
>>                "item_name": "default"
>>            },
>>            {
>>                "op": "chooseleaf_firstn",
>>                "num": -1,
>>                "type": "chassis"
>>            },
>>            {
>>                "op": "emit"
>>            }
>>        ]
>>    },
> 
> I'm not remembering having to do any type of primary affinity stuff to make 
> it work, it seemed to *just work* for the most part with making the SSD copy 
> the primary.

Yes, it should just work from my investigations, as long as you don’t change 
the primary affinity of SSD.

> One thing to keep in mind is that I find balancer distribution to be a bit 
> skewed due to the hybrid pools, though that could just be my perception.
> I've got 3x rep hdd, 3x rep hybrid, 3x rep ssd, and ec73 hdd pools, so I have 
> a bit wonky pool topology, and that could lead to issues as well with 
> distribution.
> 
> Hope this is helpful.
> 
> Reed
> 
>> On Oct 25, 2020, at 2:10 AM, [email protected] wrote:
>> 
>> Hi all,
>> 
>> We are planning for a new pool to store our dataset using CephFS. These data 
>> are almost read-only (but not guaranteed) and consist of a lot of small 
>> files. Each node in our cluster has 1 * 1T SSD and 2 * 6T HDD, and we will 
>> deploy about 10 such nodes. We aim at getting the highest read throughput.
>> 
>> If we just use a replicated pool of size 3 on SSD, we should get the best 
>> performance, however, that only leave us 1/3 of usable SSD space. And EC 
>> pools are not friendly to such small object read workload, I think.
>> 
>> Now I’m evaluating a mixed SSD and HDD replication strategy. Ideally, I want 
>> 3 data replications, each on a different host (fail domain). 1 of them on 
>> SSD, the other 2 on HDD. And normally every read request is directed to SSD. 
>> So, if every SSD OSD is up, I’d expect the same read throughout as the all 
>> SSD deployment.
>> 
>> I’ve read the documents and did some tests. Here is the crush rule I’m 
>> testing with:
>> 
>> rule mixed_replicated_rule {
>>        id 3
>>        type replicated
>>        min_size 1
>>        max_size 10
>>        step take default class ssd
>>        step chooseleaf firstn 1 type host
>>        step emit
>>        step take default class hdd
>>        step chooseleaf firstn -1 type host
>>        step emit
>> }
>> 
>> Now I have the following conclusions, but I’m not very sure:
>> * The first OSD produced by crush will be the primary OSD (at least if I 
>> don’t change the “primary affinity”). So, the above rule is guaranteed to 
>> map SSD OSD as primary in pg. And every read request will read from SSD if 
>> it is up.
>> * It is currently not possible to enforce SSD and HDD OSD to be chosen from 
>> different hosts. So, if I want to ensure data availability even if 2 hosts 
>> fail, I need to choose 1 SSD and 3 HDD OSD. That means setting the 
>> replication size to 4, instead of the ideal value 3, on the pool using the 
>> above crush rule.
>> 
>> Am I correct about the above statements? How would this work from your 
>> experience? Thanks.
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
> 
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: The feasibility of mixed SSD and HDD replicated pool

Reply via email to