Re: How to report a death by signal ?
On 02/18/15 14:20, Laurent Bercot wrote: On 18/02/2015 14:04, Olivier Brunel wrote: I don't follow, what's wrong with using a fd? It needs a convention between G and P. And I can't do that, because G and P are not necessarily both execline commands. They are normal Unix programs, and the whole point of execline is to have commands that work transparently in any environment, with only the Unix argv and envp as conventions. But isn't the whole anything = 128 will be reported as 128, and anything higher is actually 128+signum also a convention that both needs to agree upon? P will do something to report info about C to G, and P and G needs to agree about how said info will be reported. Using 128+signum is one way, using an fd for the full/correct info is another. The later being an option, it wouldn't change what P returns, but be an additional means to provide accurate information to grandprocess G should they need to. Or just, like shells, assume it's not needed and simply only do the 128+signum convention. Noting that shells do not actually clamp the exit code to 128. As illustrated by Peter's example, shells return the exit code (up to 255 included), or 128+signum. So assuming no signal, you get the accurate exit code. But of course, with anything higher than 128 there's no way of knowing if it was an exit code or a signal (unless you know exit codes don't go that high). (Clamping provides better results though, so I'm not saying don't do it; Just the difference shall probably be pointed out/documented.) Cause that was my idea as well: return the exit code or 255. I was considering it for a while, then figured that the signal number is an interesting information to have, if G remotely cares about C crashing. I prefer to reserve the whole range of 128+ for something went very wrong, most likely a crash at some point, and if you get 129+, it was directly below you and you get the signal number. Though if you want shell compatibility you could also have an option to return exit code, or 128+signum when signaled, and similarly one would either be fine with that, or have to use the fd for full/complete info. Programs that can establish a convention between one another are easy to deal with. If I remember to document the convention (finish scripts *whistle*)
Re: How to report a death by signal ?
On 18/02/2015 14:55, Olivier Brunel wrote: But isn't the whole anything = 128 will be reported as 128, and anything higher is actually 128+signum also a convention that both needs to agree upon? Sure, but most commands exit 128 so that's reliable enough, and it's a lot easier to follow than the whole pipe shebang. It's much, much simpler to exit with a given code than to write stuff to a pipe (what do you do if it blocks ? what do you do if you're fd-constrained ? what do you do if setting up the plumbing in the parent fails for whatever reason ? etc. etc.) Noting that shells do not actually clamp the exit code to 128. Indeed, but it comes at the price of uncertainty - you get accurate information if you're lucky, and complete misinformation if you're not. It works for shells most of the time because you don't manually nest shells - it's much riskier for execline. Just the difference shall probably be pointed out/documented.) Definitely. -- Laurent
Re: How to report a death by signal ?
On 18/02/2015 14:20, Peter Pentchev wrote: [roam@straylight ~]$ perl -e 'die(foo!\n);'; echo $? foo! 255 I think you should be ok, for the same reason why a shell is ok: if you're using Perl, you're most likely writing your whole script with it, especially control flow and error/crash checking. You're not playing with an inner interpreter reporting a code to an outer interpreter. So the weird 255 should not be a problem in practice. If I'm wrong and your use case precisely involves a perl script running as P or C with G being an execline command, please mention it! Just because I'd be curious. :) -- Laurent
Re: How to report a death by signal ?
On 18/02/2015 11:58, Peter Pentchev wrote: OK, so the not using the whole range of valid exit codes point rules out my obvious reply - do what the shell does - exit 128 + signum. Well the shell is happily ignoring the problem, but it doesn't mean it has solved it. The shell reserves a few exit codes, then does some best effort, hoping its invoked commands do not step on its feet. It works because most commands will avoid exiting something 125, but it's still a convention, and most importantly, the shell itself does not follow that convention (it obviously cannot!) So, something like sh -c sh -c foobar does not report errors properly: for 126 and 127, there's no way to know if the code belongs to the inner shell or the outer shell, and for 128+, there's no way to know if the inner shell or the foobar process got killed. Shells get away with it because when they're nested, it's usually auto-subshell magic and users don't want to know about the inner shell; but here, I'm trying to solve the problem for execline commands, and those tend to be nested a lot - so I definitely cannot reserve codes for the outer command, because the inner command may very well use the same ones too. Now the question is, do you want to solve this problem in general, or do you want to solve it for a particular combination of programs, even if new programs may be added to that combination in the future, but only under certain rules? If it's the former (in general), then, sorry, I don't have a satisfactory answer for you, and the fact that the POSIX shell still keeps the exit 128 + signum behavior mostly means that nobody else has come up with a better one, either (or it might be available at least as some kind of an option). It just means that nobody cares about shell exit codes. Error handling, if any, is done inside of shell scripts anyway; and in most scripts, a random signal killing a running command isn't even something people think about, and I'm sure there are hilarious behaviours hiding in dark corners of very popular shell scripts, that fortunately remain asleep to this day. For execline, however, I cannot use the same casual approach. Execline scripts live and die by proper exit code reporting, and carelessness may lead to very obvious breakage. Personally, I quite like the idea of some kind of a pipe (be it a pipe(2) pair of file descriptors or an AF_UNIX/PF_UNSPEC socketpair or some other kind of communication channel based on file descriptors), even if it is only unidirectional: Oh, don't get me wrong, I'm a fan of child-to-parent communication via pipes, and I use it wherever applicable. Unfortunately, the child may be anything here, so I need something generic. Thanks for your input ! -- Laurent
Re: How to report a death by signal ?
On 18/02/2015 14:04, Olivier Brunel wrote: I don't follow, what's wrong with using a fd? It needs a convention between G and P. And I can't do that, because G and P are not necessarily both execline commands. They are normal Unix programs, and the whole point of execline is to have commands that work transparently in any environment, with only the Unix argv and envp as conventions. Cause that was my idea as well: return the exit code or 255. I was considering it for a while, then figured that the signal number is an interesting information to have, if G remotely cares about C crashing. I prefer to reserve the whole range of 128+ for something went very wrong, most likely a crash at some point, and if you get 129+, it was directly below you and you get the signal number. Though if you want shell compatibility you could also have an option to return exit code, or 128+signum when signaled, and similarly one would either be fine with that, or have to use the fd for full/complete info. Programs that can establish a convention between one another are easy to deal with. If I remember to document the convention (finish scripts *whistle*) -- Laurent
Re: How to report a death by signal ?
On Wed, Feb 18, 2015 at 01:58:34PM +0100, Laurent Bercot wrote: I'm leaning more and more towards the following approach: - child crashed: exit 128 + signal number - child exited with 128 or more: exit 128 - else: exit the child's exit code. Assuming normal commands never exit more than 127, that reports the whole information to the immediate parent, and correct information, if incomplete, higher up. That should be enough to make things work in all cases. Thoughts ? Hm, just a point here; not saying it's very important, but people do sometimes use Perl for stuff :) [0] [roam@straylight ~]$ perl -e 'die(foo!\n);'; echo $? foo! 255 [roam@straylight ~]$ Yes, I know, I know, I myself have no idea why die() exits with 255 instead of, say, 1... or rather, yes, I do have an idea (make it different from any code that a usual program would exit with), but I do recognize it as weird :( So, hm, I don't know... Clamp the killed by a signal range to 128-191? Unfortunately, I'm not sure (couldn't find it in a quick browsing of the POSIX specs) whether there is actually a defined maximum signal number; anything higher than 63 is reported as 63 might be an option here. Or, of course, just ignore Perl's die()'s weirdness or rather wrap it up in any exit code larger than 128 is unusual enough to be lumped together with other exit codes larger than 128, as your proposal already states... so apologies for the wasted electrons, I guess :) G'luck, Peter [0] ...and, yes, at $RealJob we do use a Perl loader/watcher for several important daemons... but then we have to keep compatibility with various init systems, we can't force a choice upon our customers :) -- Peter Pentchev r...@ringlet.net r...@freebsd.org p.penc...@storpool.com PGP key:http://people.FreeBSD.org/~roam/roam.key.asc Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13 signature.asc Description: Digital signature
Re: How to report a death by signal ?
I'm leaning more and more towards the following approach: - child crashed: exit 128 + signal number - child exited with 128 or more: exit 128 - else: exit the child's exit code. Assuming normal commands never exit more than 127, that reports the whole information to the immediate parent, and correct information, if incomplete, higher up. That should be enough to make things work in all cases. Thoughts ? -- Laurent