Date: Wed, 17 May 2023 17:23:21 +1000 From: Martin D Kealey <mar...@kurahaupo.gen.nz> Message-ID: <can_u6mwehjth58afwbbvfxud7esujjhdbc9mw4+dgic0zom...@mail.gmail.com>
| I suspect putting "local" in a loop is doing something strange. "local" is an executable statement, not a declaration (shell really has none of the latter) - every time it is executed it creates a new local variable (which remains until the function exits, there are no local scope rules in shell either). That should make no difference to this code though, and the difference you report likely hints at the source of the problem. The code is written weirdly however, this sequence code=0; wait -n || code=$? could just be wait -n; code=$? (the "local" that might be there makes no difference, or shouldn't, to the execution semantics). Getting status==127 out of the waitjobs function should be impossible, as it starts out being 0, and is only changed to $code if $code!=127 so if that ever happens, there looks to be a bug somewhere. oguzismailuy...@gmail.com said: | There is no guarantee that `wait -n' will report the status of `true', the | shell may acquire the status of `false' first. That should be irrelevant, waitjobs() has a loop that explicitly waits upon wait -n returning 127 (which it does not return to the caller, or should not) which should mean that there are no children remaining. Further, as long as waitjobs wait -n call actually reaps the exit from false, it should always return with status==1 (the exit status from false). Since false & true should both always be running in the bg when waitjobs is called, the exit status from false should always (fairly quickly, since it doesn't run for very long) be obtained, causing code==1 and hence status==1 (after which status will never be altered again as it isn't touched if code==0 or code==127 which should be the only other 2 returns from wait -n). I modified the script to get rid of the (()) usage and replace that with the similar [ ] code which made no difference at all when executed under bash, it still ends the outer loop, reasonably quickly. But then I could run the script using the NetBSD shell, where it (seems to) run forever (ie: it is still running - but forever hasn't been reached yet). I think there is a bug, probably some race condition in bash with the jobs table, causing the "false" job to get missed sometimes when running this code. That allows status to remain 0, and the outer look to break, and the script to terminate. Mostly likely the use of "local" in the loop which caused the difference that Martin noticed alters the timing somewhat to affect the race results. kre