On Fri, 2019-04-05 at 17:19 +0200, Lars Ellenberg wrote: > On Fri, Apr 05, 2019 at 09:56:51AM -0500, Ken Gaillot wrote: > > On Fri, 2019-04-05 at 09:44 -0500, Ken Gaillot wrote: > > > On Fri, 2019-04-05 at 15:50 +0200, Lars Ellenberg wrote: > > > > As mentioned in #clusterlabs, > > > > but I think I post it here, so it won't get lost: > > > > > > > > pacemaker 1.1.19, in case that matters. > > > > > > > > "all good". > > > > > > > > provoking resource monitoring failure > > > > (manually umount of some file system) > > > > > > > > monitoring failure triggers pengine run, > > > > (input here is no fail-count in status section yet, > > > > but failed monitoring operation) > > > > results in "Recovery" transition. > > > > Ah, I misunderstood ... I was thinking the two writes were the fail > > count and the last failure time node attributes, but that's not it > > (and > > probably already handled sufficiently). > > > > The two writes are the recording of the operation status result, > > followed by the recording of the fail count and last failure time > > node > > attributes. > > > > That's more challenging. The controller needs to see the status > > result > > to determine whether it was a failure or not, at which point it > > sends > > the fail count change to the attribute manager, which writes it to > > the > > CIB, and the controller can only act on it then. > > > > A possible solution would be to skip running the scheduler for the > > status result if we're sending a fail count change. But that's > > risky > > because it assumes everything involved in the fail count change > > will > > succeed, and in a reasonable time. That's why the current approach > > is > > to run the scheduler immediately and then re-run when the fail- > > count > > change is confirmed. > > I see. So there is that. > We can live with that, I guess, > it is very unlikely to cause issues "in the real world". > > > > > which is then aborted by the fail-count=1 update of the very > > > > failure > > > > that this recovery transition is about. > > > > > > > > Meanwhile, the "stop" operation was already scheduled, > > > > and results in "OK", so the second pengine run > > > > now has as input a fail-count=1, and a stopped resource. > > > > > > > > The second pengine run would usually come to the same result, > > > > minus already completed actions, and no-one would notice. > > > > I assume it has been like that for a long time? > > > > > > Yep, that's how it's always worked > > > > > > > But in this case, someone tried to be smart > > > > and set a migration-threshold of "very large", > > > > in this case the string in xml was: 999999999999, > > > > and that probably is "parsed" into some negative value, > > > > > > Anything above "INFINITY" (actually 1,000,000) should be mapped > > > to > > > INFINITY. If that's not what happens, there's a bug. Running > > > crm_simulate in verbose mode should be helpful. > > I think I found it already. > > char2score() does crm_parse_int(), > and reasonably assumes that the result is the parsed int. > Which it is not, if the result is -1, and errno is set to EINVAL or > ERANGE ;-)
Ahhhhh :) > char2score -> crm_parse_int > "999999999999" -> result of strtoll is > INT_MAX, > result -1, errno ERANGE > migration_threshold = -1; > > Not sure what to do there, though. > Yet an other helper, > mapping ERANGE to appropriate MIN/MAX for the conversion? Yes, but we don't need a another helper ... it looks like the problem is that crm_parse_int() maps ERANGE to -1. At that point, it has the value from crm_parse_ll() which should be LONG_MIN or LONG_MAX, so it should map it to INT_MIN or INT_MAX instead. I'll take care of that. > But any "sane" configuration would not even trigger that. > Where and how would we point out the "in-sane-ness" to the user, > though? crm_int_helper() will log an error if anything goes wrong, so I think that's covered as far we can. Did you check to see if there was a "Conversion ... was clipped" log message in your case? (We need the above fix to actually clip it in this case instead of blowing up, but at least there's a message.) > > > > which means the fail-count=1 now results in "forcing away ...", > > > > different resource placements, > > > > and the file system placement elsewhere now results in much > > > > more > > > > actions, demoting/role changes/movement of other dependent > > > > resources > > > > ... > > > > > > > > > > > > So I think we have two issues here: > > > > > > > > a) I think the fail-count update should be visible as input in > > > > the > > > > cib > > > > before the pengine calculates the recovery transition. > > > > > > Agreed, but that's a nontrivial change to the current design. > > > > > > Currently, once the controller detects a failure, it sends a > > > request > > > to the attribute manager to update the fail-count attribute, then > > > immediately sends another request to update the last-failure > > > time. > > > The attribute manager writes these out to the CIB as it gets > > > them, > > > the CIB notifies the controller as each write completes, and the > > > controller calls the scheduler when it receives each > > > notification. > > > > > > The ideal solution would be to have the controller send a single > > > request to the attrd with both attributes, and have attrd write > > > the > > > values together. However the attribute manager's current > > > client/server protocol and inner workings are designed for one > > > attribute at a time. Writes can be grouped opportunistically or > > > when a dampening is explicitly configured, but the main approach > > > is > > > to write each change as it is received. > > > > > > There would also need to be some special handling for mixed- > > > version > > > clusters (i.e. keeping it working during a rolling upgrade when > > > some > > > nodes have the new capability and some don't). > > > > > > It's certainly doable but a medium-sized project. If there are > > > volunteers I can give some pointers :) but there's quite a > > > backlog of > > > projects at the moment. > > > > > > > b) migration-theshold (and possibly other scores) should be > > > > properly > > > > parsed/converted/capped/scaled/rejected > > > > > > That should already be happening > > -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/developers ClusterLabs home: https://www.clusterlabs.org/