Make sense, would you mind me to take this job? I will start with the conversion function in monitor.
在 2014-8-9,0:08,"Sage Weil" <[email protected]> 写道: > On Fri, 8 Aug 2014, Chen, Xiaoxi wrote: >> For my side, I have seen some guys(actually more than 80% of the user I have >> seen in university/company) do as the following way: >> >> 0.What they want to do is : create a pool that located in a specified rack. >> >> 1.Create a rule with "ceph osd crush rule create-simple myrule1 rack2 host >> firstn >> 2.Create a pool named mypool1 >> 3.Set the ruleset of the pool, but they aren't that clear about the >> difference in concepts of rule and ruleset... Here they just need an ID, so >> they use "ceph osd crush rule ls" to list all the rules(they may imaging >> rule=ruleset), and then start to count, 0, 1,2, 3, aha, the ID for myrule1 >> is 3. So they simply type in "ceph osd pool set mypool1 crush_ruleset 3.... >> >> In most case, this works, but actually this is not the right way to do..... > > Yeah, this is exactly the sort of flow we should fix. > > How about this: > > Starting with Giant (or whatever), we enforce that ruleset == rule id. > That is, the ruleset must be unique. We have a conversion function in the > monitor that will take an existing OSDMap (w/ embedded CRUSH map) and make > any changes needed to make this true by giving a new ruleset to any > rules that share, and adjusting the pools accordingly. Moving forward, > the mon will refuse to accept an injected CRUSH map that doesn't satisfy > this constraint, and the new crushtool will refuse to compile one. > > We set a flag on the OSDMap indicating that this invariant (one rule per > ruleset and rule id == ruleset id) is now true. If this flag is set, the > mapping code can skip the old rule resolution (which searches all rules > for a rule with the right ruleset and size) and look up the rule directly. > > We also adjust crushtool decompile to say > > rule replicated_ruleset { > id 0 # do not change unnecessarily > type replicated > > instead of > > rule replicated_ruleset { > ruleset 0 > type replicated > > We can continue to recognize "ruleset" instead of "id" when compiling but > issue a warning. We can also drop the min/max size values when > decompiling and ignore them when compiling (and always set them to large > numbers, like 0/255). > > Then we adjust all of the mon commands to take rule ids (== ruleset ids) > or rule name. > > What do you think? > sage > > >> >> -----Original Message----- >> From: Loic Dachary [mailto:[email protected]] >> Sent: Friday, August 8, 2014 11:11 PM >> To: Chen, Xiaoxi >> Cc: Sage Weil; Ma Jianpeng; Ceph Development >> Subject: Re: Resolving the ruleno / ruleset confusion >> >> I guess most users just think of ruleset and never finds out there are two >> different numbers with slightly different semantic. >> >> For me the use case is, 100% of the time : creating a rule via the command >> line and get the ruleset via dump OR create and update a rule via dump / >> load the osdmap, in which case I diligently (for no reason, just because it >> seemed right) increment the ruleset and keep them in order. >> >> I have no use of rule ids and only use rulesets. >> >> Cheers >> >> On 08/08/2014 16:54, Chen, Xiaoxi wrote: >>> I think before we start bug fix or try to get rid of ruleset concept, we >>> can start with define a reasonable use case. How we expect user to play >>> with rule and pools. there is no CLI to create/modify a ruleset, even >>> worse , you are not able to get the ruleset id without dump a rule. >>> >>> currently the logic of command flow is really strange, user writes a rule, >>> when he wants to use the rule,he need to find out the ruleset who contains >>> the rule, and specified the ruleset to a pool. If the ruleset only contains >>> a rule, the concept of ruleset is confusing and useless, if the ruleset >>> contains more than one rules, user may have the risk that ceph select a >>> rule in the ruleset, but not the one he want... >>> >>> >>> >>> ? 2014-8-8?22:34?"Loic Dachary" <[email protected]> ??? >>> >>>> >>>> >>>> On 08/08/2014 16:12, Sage Weil wrote: >>>>> On Fri, 8 Aug 2014, Loic Dachary wrote: >>>>>> Hi, >>>>>> >>>>>> As you noticed, there are places where ruleset and ruleno / ruleid >>>>>> are used interchangeably although they are not. This is a source of >>>>>> subtle bugs that can be hard to trace. By default ruleid and >>>>>> ruleset are the same, but dumping a crush map including >>>>>> >>>>>> rule data { >>>>>> ruleset 0 >>>>>> type replicated >>>>>> min_size 1 >>>>>> max_size 10 >>>>>> step take default >>>>>> step chooseleaf firstn 0 type host >>>>>> step emit >>>>>> } >>>>>> rule metadata { >>>>>> ruleset 1 >>>>>> type replicated >>>>>> min_size 1 >>>>>> max_size 10 >>>>>> step take default >>>>>> step chooseleaf firstn 0 type host >>>>>> step emit >>>>>> } >>>>>> >>>>>> and swapping the rules as follows >>>>>> >>>>>> rule metadata { >>>>>> ruleset 1 >>>>>> type replicated >>>>>> min_size 1 >>>>>> max_size 10 >>>>>> step take default >>>>>> step chooseleaf firstn 0 type host >>>>>> step emit >>>>>> } >>>>>> >>>>>> rule data { >>>>>> ruleset 0 >>>>>> type replicated >>>>>> min_size 1 >>>>>> max_size 10 >>>>>> step take default >>>>>> step chooseleaf firstn 0 type host >>>>>> step emit >>>>>> } >>>>>> >>>>>> will have ruleset 1 with rule id 0 and ruleset 0 with rule id 1 >>>>>> >>>>>> Since the ruleset is the only reliable number, from the user point >>>>>> of view, we could simply change CrushWrapper.h to never return the >>>>>> rule id and assume only ruleset are given in argument, even where >>>>>> it currently claims to be a rule id. >>>>> >>>>> I'm worried about making that sort of change in an internal interface. >>>>> And, more generally, about CRUSH maps in the wild that may have odd >>>>> mappings that we don't want to break with subtle changes (even >>>>> fixes). :/ >>>>> >>>>>> The downside is that looking up the ruleset implies iterating over >>>>>> all the rules, but that's probably not an issue. >>>>>> >>>>>> What do you think ? >>>>> >>>>> I sat down a few months ago and tried to figure out if we could get >>>>> rid of the ruleset concept entirely and simply map pools directly to >>>>> rules (which are the things the user conceptually thinks about, we name, >>>>> etc.). >>>>> The original motivation for a ruleset was to be able to adjust the >>>>> pool replication factor and have the system adjust the placement >>>>> behavior accordingly, but in reality that is a pretty useless >>>>> capability: num_rep rarely changes, and when it does you can simply >>>>> adjust the placement rule at the same time. Unfortunately, I didn't >>>>> come up with any easy and clean way to do it and gave up. >>>>> >>>>> I think we should try again. Getting rid of this particular wart >>>>> will save us a lot of confusion and complexity and improve the >>>>> user/admin experience significantly... >>>>> >>>>> My suspicion is that we may need to have a explicit 'upgrade' >>>>> validation step that rejiggers an existing CRUSH map to remap >>>>> ruleids and rulesets to map to each other, and enforce that >>>>> constraint on the cluster. Then we could get away with renaming the >>>>> field and clean up all the admin tools and such based on that >>>>> constraint. Then, in a year or two, we can change the actual >>>>> placement code to drop the ruleset logic. Otherwise we'll need to >>>>> set incompatible feature bits and force clients to update and so on, >>>>> which we want to avoid... >>>> >>>> Understood. Even before going into this, it looks like we need a way to >>>> find all bugs like http://tracker.ceph.com/issues/9044 and fix them. >>>> Reading the code won't be enough I'm afraid. What about changing ruleno >>>> and ruleset into structs so that compilation shows where they are used >>>> interchangeably when they should not ? >>>> >>>> Cheers >>>> >>>> -- >>>> Lo?c Dachary, Artisan Logiciel Libre >> >> -- >> Lo?c Dachary, Artisan Logiciel Libre >> >> N?????r??y??????X???v???)?{.n?????z?]z????ay?????j??f???h??????w??? >> ???j:+v???w????????????zZ+???????j"????i
