On 19/04/17 21:08, Reed Dier wrote:
> Hi Maxime,
> 
> This is a very interesting concept. Instead of the primary affinity being 
> used to choose SSD for primary copy, you set crush rule to first choose an 
> osd in the ‘ssd-root’, then the ‘hdd-root’ for the second set.
> 
> And with 'step chooseleaf first {num}’
>> If {num} > 0 && < pool-num-replicas, choose that many buckets. 
> So 1 chooses that bucket
>> If {num} < 0, it means pool-num-replicas - {num}
> And -1 means it will fill remaining replicas on this bucket.
> 
> This is a very interesting concept, one I had not considered.
> Really appreciate this feedback.
> 
> Thanks,
> 
> Reed
> 
>> On Apr 19, 2017, at 12:15 PM, Maxime Guyot <[email protected]> wrote:
>>
>> Hi,
>>
>>>> Assuming production level, we would keep a pretty close 1:2 SSD:HDD ratio,
>>> 1:4-5 is common but depends on your needs and the devices in question, ie. 
>>> assuming LFF drives and that you aren’t using crummy journals.
>>
>> You might be speaking about different ratios here. I think that Anthony is 
>> speaking about journal/OSD and Reed speaking about capacity ratio between 
>> and HDD and SSD tier/root. 
>>
>> I have been experimenting with hybrid setups (1 copy on SSD + 2 copies on 
>> HDD), like Richard says you’ll get much better random read performance with 
>> primary OSD on SSD but write performance won’t be amazing since you still 
>> have 2 HDD copies to write before ACK. 
>>
>> I know the doc suggests using primary affinity but since it’s a OSD level 
>> setting it does not play well with other storage tiers so I searched for 
>> other options. From what I have tested, a rule that selects the 
>> first/primary OSD from the ssd-root then the rest of the copies from the 
>> hdd-root works. Though I am not sure it is *guaranteed* that the first OSD 
>> selected will be primary.
>>
>> “rule hybrid {
>>  ruleset 2
>>  type replicated
>>  min_size 1
>>  max_size 10
>>  step take ssd-root
>>  step chooseleaf firstn 1 type host
>>  step emit
>>  step take hdd-root
>>  step chooseleaf firstn -1 type host
>>  step emit
>> }”
>>
>> Cheers,
>> Maxime

FWIW splitting my HDDs and SSDs into two separate roots and using a crush rule 
to first choose a host from the SSD root and take remaining replicas on the HDD 
root was the way I did it, too. By inspection, it did seem that all PGs in the 
pool had an SSD for a primary, so I think this is a reliable way of doing it. 
You would of course end up with an acting primary on one of the slow spinners 
for a brief period if you lost an SSD for whatever reason and it needed to 
rebalance.

The only downside is that if you have your SSD and HDD OSDs on the same 
physical hosts I'm not sure how you set up your failure domains and rules to 
make sure that you don't take an SSD primary and HDD replica on the same host. 
In my case, SSDs and HDDs are on different hosts, so it didn't matter to me.
-- 
Richard Hesketh

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to