On Tue, 2019-10-15 at 13:08 +0200, Tony den Haan wrote: > Hi, > I ran into getting "error 1" from portblock, so OCF_ERR_GENERIC, > which for me doesn't guarantee the error was RC from portblock or > pacemaker itself. > Wouldn't it be quite useful to > 1) give the agents a unique number to add to the OCF RC code, thus > helping to determine origin of error > 2) show an actual error string instead of "unknown error(1)". This is > the last you want to see when a cluster is stuck. > > Tony
I agree it's an issue, but the exit codes have to stay fairly generic. There are only 255 possible exit codes, and half of those most shells use for signals. Meanwhile there are dozens of agents. More importantly, Pacemaker needs standard meanings to know how to respond. However there are possibilities: - OCF could add a few more codes for common error conditions. (This requires updating the standard, as well as software such as Pacemaker to be aware of them.) - OCF already supports an arbitrary string "exit reason" which pacemaker will display beyond just "unknown". It's up to the individual agents to support this, and all of them should. Agents can get as specific as they like with exit reasons. - Agents can also log to the system log, or print error output which pacemaker will log in its detail log. Many already provide good information this way, but there's always room for improvement. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/developers ClusterLabs home: https://www.clusterlabs.org/