1 Summary XCU Ch2.14 states that 'return' shall cause the shell to leave the current function or dot script, if any. Ch2.95 says that execution shall continue with the next command after the function call. Implementations that claim conformance consistently contradict this specification, if the function has created a subshell. They can't both be right. As the specification was in part intended to codify existing practice, how did this contradiction arise?
2 Description 2.1 Relevant specifications POSIX Vol3 (XCU) Ch2.14 'return' <https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#return> says, and has since at least 2004:
The return utility shall cause the shell to stop executing the current function or dot script. If the shell is not currently executing a function or dot script, the results are unspecified.
In case this wasn't clear enough, in Vol3 Ch2.9.5 <https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_05>:
A function is a user-defined name that is used as a simple command to call a compound command with new positional parameters. ... The compound-command shall be executed whenever the function name is specified as the name of a simple command... If the special built-in return ... is executed in the compound-command, the function completes and execution shall resume with the next command after the function call.
So not with some other command inside the function, you'd imagine. 2.2 The issue Implementations consistently contradict the above wording when the 'return' is in a 'subshell' (probably any 'separate shell execution context'), treating this use of return as if the script used 'exit'. Yet suppliers have been claiming conformance to the existing wording. Are they all wrong, or is an there an adaptation or interpretation of the specification that would align it with reality? 2.3 Example A simple test case is foo() { ( if [ "$1" = fum ]; then echo EQ; return 0; fi ) echo NE; return 1 } with expected result $ foo fum || echo WTF EQ $ but in the several modern shells (dash-0.5.9.1 and earlier, bash-4.3-14ubuntu1.4, busybox-static-1:1.22.0-15ubuntu1.4) tested, we get $ foo fum || echo WTF EQ NE WTF $ 2.4 Interpretations One oracle has said:
In the subshell, the shell should not be considered to still be executing a function or dot script. As such, the results should be unspecified, and any behaviour should be valid. The standard may be underspecified here, but any other interpretation is not reasonable.
But if you read the standard without having knowledge of existing shell internals, it's entirely reasonable (and IMO desirable) to consider that a shell function is a lexical group, like a script file, which is being executed as long as any command within the function's defining compound-command is running; as the spec refers to subshells explicitly elsewhere (eg 'exit') the reader would have to believe that "subshell" was accidentally omitted from the list of contexts that 'return' should return from, to interpret the text as quoted above. The resolution of a related rejected Defect Report 1042 <https://www.austingroupbugs.net/view.php?id=1042#c3120> says:
... the results of using return when you are not in a shell execution environment running a function or a dot script is unspecified.
But this is restating the wording of the standard, unless "in a shell execution environment" means "in a shell execution environment, and not in a subshell environment thereof", which, as argued above, is additional to that wording. Under DR 842, clarifications <https://www.austingroupbugs.net/view.php?id=842#c2257> have been made on the scope of the 'break' and 'continue' special utilities so that the expected behaviour matches the specification; 'return' did not receive such attention. 3 Resolution I considered these possible resolutions, though others may exist: a) the existing 2.14 and 2.95 text means to permit the interpretation that 'return' from a function when in a subshell context may just exit the subshell and not return from the function; b) the existing 2.14 text is consistent with the observed behaviour but the 2.95 text specifying function definition must be changed to restrict the types of command that can be used in the definition; c) the text of 2.14 and 2.95 means what it appears to say. 3.1 Resolution (a) A non-normative clarification should be added to avoid misleading the naive reader, or the normative text might be revised to add 'separate shell execution context' to the list of contexts that 'return' returns from, or to restrict what is meant by executing 'return' "in" a function. However existing scripts should not be affected, and most, if not all, shell implementers can rest easy. Shell programmers would still have to avoid 'return' or use work-arounds to avoid uncertainty as to whether the shell implementation may decide to run some instance of 'return' in a subshell. As an example of restricting what is meant by executing 'return' "in" a function, the wording used for DR 842 could be adapted (Vol3 Ch2.14 return:DESCRIPTION), the modified wording to cover dot scripts being "for further study", as follows: "The return utility shall cause the shell to stop executing the current function or dot script that lexically encloses it. If there is no such function or dot script, the results are unspecified. A function shall lexically enclose a return command if the command is: - executing in the same execution environment (see section 2.12) as the function's defining compound-command (see section 2.9.5), and - contained in the function's defining compound-command, and - not contained in the body of a function whose function definition command is contained in the function's defining compound-command." As the text of 2.9.5 cross-references 2.14, this should also be sufficient to clarify the wording of the final sentence quoted above (ie above 'Exit Status' in the text). One may ignore, or not, that, as with the DR 842 wording and the resulting standard text, the uses of the terms "encloses" and "lexically encloses" are inverted from common usage: ie, "lexically encloses" should have been used to mean "within the textual definition of" as in bullets 2 and 3 above, and plain "encloses" should have been defined as "lexically encloses" plus bullet 1, execution environments not being "lexical" as normally understood. 3.2 Resolution (b) The specification of the function definition command would have to exclude any construct that could implicitly or explicitly create a separate shell execution context (although no definitive list of these is included in the current standard). Many existing scripts whose authors expected to be portable according to the standard would suddenly be relying on unspecified behaviour, and shell implementers would have to make a breaking change. 3.3 Resolution (c) Future shell programmers would not have to avoid 'return' or use work-arounds in shell functions to avoid uncertainty as to whether the shell implementation has decided to run some instance of 'return' in a subshell. Existing scripts that relied on the interpretation (a) might behave differently, and shell implementers would have to make (what they would consider) a breaking change. However an existing script work-around on the lines of "subshell-command-including-return; return" should not be affected. A non-normative clarification could be added to the standard to deprecate the historical interpretation, say at Vol3 Ch2.14 return:RATIONALE (new para 1): "The return command is defined relative to the function or dot script that contains it; it is considered to be executed in that function or dot script even if the execution of the function or dot script may have created a separate shell execution context when the return command is executed. However this interpretation was not widely adopted in historical implementations; consequently shell programmers may need to work around such implementation differences to achieve portability outside the scope of this standard." /df -- London SW6 UK