Late reply, but I have been using what I refer to as a "hybrid" crush topology
for some data for a while now.
Initially with just rados objects, and later with RBD.
We found that we were able to accelerate reads to roughly all-ssd performance
levels, while bringing up the tail end of the write performance a bit.
Write performance wasn't orders of magnitude improvements, but the ssd write +
replicate to hdd cycle seemed to be an improvement in reducing slow ops, etc.
I will see if I can follow up with some rough benchmarks I can dig up.
As for implementation, I have SSD-only hosts, and HDD-only hosts, bifurcated at
the root level of crush.
> {
> "rule_id": 2,
> "rule_name": "hybrid_ruleset",
> "ruleset": 2,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -13,
> "item_name": "ssd"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 1,
> "type": "host"
> },
> {
> "op": "emit"
> },
> {
> "op": "take",
> "item": -1,
> "item_name": "default"
> },
> {
> "op": "chooseleaf_firstn",
> "num": -1,
> "type": "chassis"
> },
> {
> "op": "emit"
> }
> ]
> },
I'm not remembering having to do any type of primary affinity stuff to make it
work, it seemed to *just work* for the most part with making the SSD copy the
primary.
One thing to keep in mind is that I find balancer distribution to be a bit
skewed due to the hybrid pools, though that could just be my perception.
I've got 3x rep hdd, 3x rep hybrid, 3x rep ssd, and ec73 hdd pools, so I have a
bit wonky pool topology, and that could lead to issues as well with
distribution.
Hope this is helpful.
Reed
> On Oct 25, 2020, at 2:10 AM, [email protected] wrote:
>
> Hi all,
>
> We are planning for a new pool to store our dataset using CephFS. These data
> are almost read-only (but not guaranteed) and consist of a lot of small
> files. Each node in our cluster has 1 * 1T SSD and 2 * 6T HDD, and we will
> deploy about 10 such nodes. We aim at getting the highest read throughput.
>
> If we just use a replicated pool of size 3 on SSD, we should get the best
> performance, however, that only leave us 1/3 of usable SSD space. And EC
> pools are not friendly to such small object read workload, I think.
>
> Now I’m evaluating a mixed SSD and HDD replication strategy. Ideally, I want
> 3 data replications, each on a different host (fail domain). 1 of them on
> SSD, the other 2 on HDD. And normally every read request is directed to SSD.
> So, if every SSD OSD is up, I’d expect the same read throughout as the all
> SSD deployment.
>
> I’ve read the documents and did some tests. Here is the crush rule I’m
> testing with:
>
> rule mixed_replicated_rule {
> id 3
> type replicated
> min_size 1
> max_size 10
> step take default class ssd
> step chooseleaf firstn 1 type host
> step emit
> step take default class hdd
> step chooseleaf firstn -1 type host
> step emit
> }
>
> Now I have the following conclusions, but I’m not very sure:
> * The first OSD produced by crush will be the primary OSD (at least if I
> don’t change the “primary affinity”). So, the above rule is guaranteed to map
> SSD OSD as primary in pg. And every read request will read from SSD if it is
> up.
> * It is currently not possible to enforce SSD and HDD OSD to be chosen from
> different hosts. So, if I want to ensure data availability even if 2 hosts
> fail, I need to choose 1 SSD and 3 HDD OSD. That means setting the
> replication size to 4, instead of the ideal value 3, on the pool using the
> above crush rule.
>
> Am I correct about the above statements? How would this work from your
> experience? Thanks.
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]