Thank you for sharing your experience. Glad to hear that someone has already used this strategy and it works well.
> 在 2020年10月27日,03:10,Reed Dier <[email protected]> 写道: > > Late reply, but I have been using what I refer to as a "hybrid" crush > topology for some data for a while now. > > Initially with just rados objects, and later with RBD. > > We found that we were able to accelerate reads to roughly all-ssd performance > levels, while bringing up the tail end of the write performance a bit. > Write performance wasn't orders of magnitude improvements, but the ssd write > + replicate to hdd cycle seemed to be an improvement in reducing slow ops, > etc. > > I will see if I can follow up with some rough benchmarks I can dig up. > > As for implementation, I have SSD-only hosts, and HDD-only hosts, bifurcated > at the root level of crush. > >> { >> "rule_id": 2, >> "rule_name": "hybrid_ruleset", >> "ruleset": 2, >> "type": 1, >> "min_size": 1, >> "max_size": 10, >> "steps": [ >> { >> "op": "take", >> "item": -13, >> "item_name": "ssd" >> }, >> { >> "op": "chooseleaf_firstn", >> "num": 1, >> "type": "host" >> }, >> { >> "op": "emit" >> }, >> { >> "op": "take", >> "item": -1, >> "item_name": "default" >> }, >> { >> "op": "chooseleaf_firstn", >> "num": -1, >> "type": "chassis" >> }, >> { >> "op": "emit" >> } >> ] >> }, > > I'm not remembering having to do any type of primary affinity stuff to make > it work, it seemed to *just work* for the most part with making the SSD copy > the primary. Yes, it should just work from my investigations, as long as you don’t change the primary affinity of SSD. > One thing to keep in mind is that I find balancer distribution to be a bit > skewed due to the hybrid pools, though that could just be my perception. > I've got 3x rep hdd, 3x rep hybrid, 3x rep ssd, and ec73 hdd pools, so I have > a bit wonky pool topology, and that could lead to issues as well with > distribution. > > Hope this is helpful. > > Reed > >> On Oct 25, 2020, at 2:10 AM, [email protected] wrote: >> >> Hi all, >> >> We are planning for a new pool to store our dataset using CephFS. These data >> are almost read-only (but not guaranteed) and consist of a lot of small >> files. Each node in our cluster has 1 * 1T SSD and 2 * 6T HDD, and we will >> deploy about 10 such nodes. We aim at getting the highest read throughput. >> >> If we just use a replicated pool of size 3 on SSD, we should get the best >> performance, however, that only leave us 1/3 of usable SSD space. And EC >> pools are not friendly to such small object read workload, I think. >> >> Now I’m evaluating a mixed SSD and HDD replication strategy. Ideally, I want >> 3 data replications, each on a different host (fail domain). 1 of them on >> SSD, the other 2 on HDD. And normally every read request is directed to SSD. >> So, if every SSD OSD is up, I’d expect the same read throughout as the all >> SSD deployment. >> >> I’ve read the documents and did some tests. Here is the crush rule I’m >> testing with: >> >> rule mixed_replicated_rule { >> id 3 >> type replicated >> min_size 1 >> max_size 10 >> step take default class ssd >> step chooseleaf firstn 1 type host >> step emit >> step take default class hdd >> step chooseleaf firstn -1 type host >> step emit >> } >> >> Now I have the following conclusions, but I’m not very sure: >> * The first OSD produced by crush will be the primary OSD (at least if I >> don’t change the “primary affinity”). So, the above rule is guaranteed to >> map SSD OSD as primary in pg. And every read request will read from SSD if >> it is up. >> * It is currently not possible to enforce SSD and HDD OSD to be chosen from >> different hosts. So, if I want to ensure data availability even if 2 hosts >> fail, I need to choose 1 SSD and 3 HDD OSD. That means setting the >> replication size to 4, instead of the ideal value 3, on the pool using the >> above crush rule. >> >> Am I correct about the above statements? How would this work from your >> experience? Thanks. >> _______________________________________________ >> ceph-users mailing list -- [email protected] >> To unsubscribe send an email to [email protected] > _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
