On Thu, 2020-02-13 at 15:11 +0100, Jehan-Guillaume de Rorthais wrote: > On Wed, 12 Feb 2020 15:11:41 -0600 > Ken Gaillot <kgail...@redhat.com> wrote: > ... > > > INT_MAX would set the working interval to ±2GB. Producing 2GB of > > > worth of data > > > in few seconds/minutes is possible, but considering the minimal > > > XLOG > > > record, this would push to 48GB. Good enough I suppose. > > > > > > INT64_MAX would set the working interval to...±8EB. Here, no > > > mater > > > what you > > > choose as master score, you still have some safety :) > > > > > > So you think this is something worth working on? Is there some > > > traps > > > on the way > > > that forbid using INT_MAX or INT64_MAX? Should I try to build a > > > PoC > > > to discuss > > > it? > > > > I do think it makes sense to expand the range, but there are some > > fuzzy > > concerns. One reason the current range is so small is that it > > allows > > summing a large number of scores without worrying about the > > possibility > > of integer overflow. > > Interesting, make sense. > > > I'm not sure how important that is to the current code but it's > > something > > that would take a lot of tedious inspection of the scheduler code > > to make > > sure it would be OK. > > I suppose adding some regression tests for each bug reported is the > policy?
Yes, but that has a fairly small coverage of the code. Scores are used so extensively that we'd have to check everywhere they're used to make sure they can handle the change. > > > In principle a 64-bit range makes sense to me. I think "INFINITY" > > should be slightly less than half the max so at least two scores > > could > > be added without concern, and then we could try to ensure that we > > never > > add more than 2 scores at a time (using a function that checks for > > infinity). > > Few weeks ago, Lars Ellenberg pointed me to merge_weight while > discussing this > same issue on #clusterlabs: > > https://github.com/ClusterLabs/pacemaker/blob/master/lib/pengine/common.c#L397 > > I suppose it's a good, first starting point. Yep, that's what I had in mind. > > Alternatively if we come up with a code object for "score" > > that has a 64-bit int and a separate bit flag for infinity, we > > could > > use the full range. > > A bit less than half of 2^63 is already 2EB and arithmetic stay > simple. But a > score object wouldn't be too difficult too. I would give a try to the > former > first, then extend to the second if it feels close enough as the hard > would > already be done. > > > Unfortunately any change in the score will break backward > > compatibility > > in the public C API, so it will have to be done when we are ready > > to > > release a bunch of such changes. > > I'm not familiar with this C API. Any pointer? It's (somewhat) documented at: https://clusterlabs.org/pacemaker/doxygen/ The core and scheduler APIs might be the only ones affected. An example is the pe_node_t type which currently has an "int weight" member or pe_resource_t with "int stickiness". There are some public functions that use an int score too, such as char2score() and score2char(). Not that anyone actually uses the C API, but since we do make it available, we have to be careful with backward compatibility. > > It would likely be a "2.1.0" release, and probably not until 1-2 > > years from > > now. At least that gives us time to investigate and come up with a > > design. > > Sure, I'm not in a rush here anyway, that mail was in my drafts > since...6 or 12 > month maybe ? :) > > > > Beside this master score limit, we suffer from these other > > > constraints: > > > > > > * attrd_updater is highly asynchronous: > > > * values are not yet available locally when the command exit > > > * ...neither they are from remote node > > > * we had to wrap it in a loop that wait for the change to > > > become > > > available > > > locally. > > > > There's an RFE to offer a synchronous option to attrd_updater -- > > which > > you knew already since you submitted it :) but I'll mention it in > > case > > anyone else wants to follow it: > > > > https://bugs.clusterlabs.org/show_bug.cgi?id=5347 > > I forgot about this one :) > I found a way to put a bandaid on this in PAF anyway. > > > It is definitely a goal, the question is always just developer > > time. > > sure > > > > * notification actions return code are ignored > > [..this is discussed in another thread..] > > > > * OCF_RESKEY_CRM_meta_notify_* are available (officially) only > > > during > > > notification action > > > > That's a good question, whether the start/stop should be guaranteed > > to > > have it as well. > > and promote/demode. > > > One question would be whether to use the pre- or post-values. > > I would vote for pre-. When the action is called for eg. a start, the > resource > is not started yet, so it should still appears in eg. "inactive". > > > Not directly related, but in the same vein, Andrew Beekhof proposed > > a > > new promotable clone type, where promotion scores are discovered > > ahead > > of time rather than after starting instances in slave mode. The > > idea > > would be to have a new "discover" action in resource agents that > > would > > output the master score (which would be called before starting any > > instances), > > This is an appealing idea. I'm not sure why a new type of promotable > clone > would be required though. Adding this new operation to the existing > OCF specs > for promotable clone would be enough, isn't it? As far as the RA > expose > this operation in its meta-data, PEngine can decide to use it > whenever it > needs to find some clone to promote. Not just before starting the > resource, eg. even after a primary loss when all secondary are > already up. Makes sense > This would greatly help to keep the code of RA clean and simple, with > very low > cluster related-logic. > > I love the idea :) > > > and then on one instance selected to be promoted, another > > new action (like "bootstrap") would be called to do some initial > > start- > > up that all instances need, before the cluster started all the > > other > > instances normally (whether start or start+promote for multi- > > master). > > That would be a large effort -- note Beekhof was not volunteering > > to do > > it. :) > > Not convinced about this one, not sure how useful this would be from > my very > limited usecase though. If the idea is to eg. provision secondaries, > I think > this is far from the responsibility of the RA or the cluster itself. > I'm not > even sure some kind of eg. failback action would do. But maybe I > misunderstood > the idea. I forget the particular service where it would be helpful. I'll have to ask Beekhof again. > > Regards, -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/developers ClusterLabs home: https://www.clusterlabs.org/