On 05/04/2017 03:58 PM, Xavier Villaneau wrote: > Hello Loïc, > > On Thu, May 4, 2017 at 8:30 AM Loic Dachary <[email protected] > <mailto:[email protected]>> wrote: > > Is there a way to calculate the optimum nearfull ratio for a given > crushmap ? > > > This is a question that I was planning to cover in those calculations I was > working on for python-crush. I've currently shelved the work for a few weeks > but intend to look at it again as time frees up.
Of course ! Now I see how the two are related. Thanks. > Basically, I see this as a five-fold uncertainty problem: > 1. CRUSH mappings are pseudo-random and therefore (usually) uneven > 2. Object distribution between placement groups has the exact same issue > 3. Object size within a given pool can also vary greatly (from bytes to > megabytes) > 4. Failures and the following re-balancing are also random. > 5. Finally, pools can occupy different and overlapping sets of OSDs, and hold > independent sets of objects. > > Thanks to your new CRUSH tools, I think #1 and #4 are solved respectively by > the ability to: > - generate a CRUSH map for a precise (and even) distribution of PGs; > - test mappings for every scenario of N failures and find the worst-case > scenario (very expensive calculation, but possible). > > Issues #2 and #3 are more tricky. The big picture is that a given amount of > data is placed more evenly the more objects there are, and there should be a > way to use statistics to quantify that. Variance in object size then brings > in more uncertainty, but I think that metric is difficult to quantify outside > of very specific use cases where object size are known. > > Finally, this might all be made redundant by the new auto-rebalancing feature > that Sage is planning for Luminous. If we can assume even data placement at > all times the #4 is the only thing we need to worry about. For > performance-based placement that would be very different however. And if > pools have overlapping OSD sets, that could be fairly tricky too. > > Maybe some other users here already have some rule of thumb or actual > calculations for that. I was planning to get into the statistical > calculations of data placement assuming unique object size as the next step > for the paper I am working on. Would there be a need for such tools? > > Regards, > -- > Xavier Villaneau > Storage Software Eng. at Concurrent Computer Corp. > -- Loïc Dachary, Artisan Logiciel Libre _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
