While generating https://www.austingroupbugs.net/view.php?id=1778#c6550
(note 6550 to bug 1778, mostly about field splitting with the read utility, and in particular whether reading into some vars should have unspecified effects if changes to those variables could affect the field splitting behaviour - reading into, and hence changing, IFS is an obvious example) and even earlier, I started to consider what the relationship should be between shell variables, and shell built-in utilities. Utilities like read (also getopts, cd, ...) which (almost) must be built in as they are specified to alter shell variables are something of a special case, so I'll defer discussion of those until later in this message. [Aside: just "almost must be built in" for some of these, as an implementation could have some other method to allow a utility to interact with the shell, and use that to allow designated utilities to alter shell variables, or other aspects of the shell environment.] So, for now, let's just consider the "often" built in utilities, like printf, echo, test (aka '[') etc. With those, if a shell does something like unset LANG LC_ALL LC_CTYPE LC_COLLATE LC_MONETARY LC_TIME .... LANG=weird printf format arg arg arg Is printf allowed, required, or prohibited from doing its output as if LANG==weird ? Note that LANG here is not exported (that was part of the point of the unset) and if printf were not built in, it would have no access to the shell's internal LANG variable. But if it is builtin, it does. Is there any language in the current (or forthcoming) standard that is intended to specify this? (If anyone knows of some, please reference or quote it.) Similarly with test, and the collating sequence for the weird LANG. Note that if we were instead to do export LANG=weird printf format arg arg arg or LANG=weird printf format arg arg arg then it is clear that the exported LANG is intended (required) for printf to use (and similarly for any other utilities, built-in or not). Now we get to the issue of those utilities which are required to alter shell variables, where for consistency I think some of the answers will depend upon the answer to the question above. Let's take a particularly simple (and now clear) example first X=whatever X=something unset X In the forthcoming standard, it is clear than when this completes, X must be unset, and not have either "whatever" or "something" as its value, and must not be exported. That applies to any special built-in utility which modifies shell variables. Now let's look at a similar, but closely related (but much more complex) case X=whatever X=something . script and assume the script does X=newvalue as one of its commands (whole command, not a var-assign for something else), and that that is the sole mention of X in "script" (or perhaps it is expanded as well, but that doesn't affect its value). Since '.' is a special builtin, I believe the same rule applies, and that when the dot script completes, the shell environment should have X=newvalue as part of it, though it is less clear to me what the requirement is wrt X's export status (must be, must not be, unspecified whether ...). If we had instead unset X; X=newvalue in the script, then I think it would be clear, when the script is complete the shell environment must have X=newvalue and X must not be exported. [Aside: for anyone wanting to make exceptions in case X is readonly, then we know here it cannot be, as we are making assignments to X before running the dot script.] To make this less abstract, a more likely example perhaps PATH=/where/my/script/lives . script and "script" sets PATH to whatever I really want it to be. That might be all it does, script might be a single line containing PATH=/bin:/usr/bin (or something). There'd be no question if I instead did . /where/my/script/lives/script but I didn't, I chose to find the script using the temporary exported PATH. All of this is now (will be in POSIX Issue 8) specified for special built in utilities. In the PATH example, in both invocations, PATH must end up being what the script set it to, not whatever it had previously held, and not the value exported into the script in the first invocation (though that would be what it would be required to be if the script did not set PATH). But all that doesn't cover other utilities that are built in, which are not special built-in, like read, cd and getopts, but which do set variables. It would (or could) also cover extensions in various shells, like bash's printf's -v option (write the output into a shell variable) or its %n format specifier (next arg is a var name, which gets set to the number of bytes (or maybe chars, doesn't matter here) which have been output before that format specifier (just like printf(3)). OK, first question here, and remember to consider your answer to the questions above about built in printf, test, etc, if we do unset IFS IFS=: read -r a b c If the line read from standard input (without the leading white space, that's just for this email, only the spaces between y and p, and q and r exist in the line) is x:y p:q r should we end up with a=x b='y p' c='q r" or a=x:y b=p:q c=r ? Why, and what in the standard specifies (or allows) the answer you believe to be correct? Second question (and for this one, assume that the read implementation behaves as is specified in mantis bug 1778 - which all shells do for this purpose as far as I can tell) If we have X=foo IFS= X=bar read -r X and the input is "abc\n" (the \n becomes the line delimiter, and is removed) then what value will be in X after this is complete? Why? Note that here we explicitly export IFS to read, avoiding the issue of the previous question, and by giving read an empty value for IFS, we explicitly say "no field splitting happens", so the line read is to be assigned to the variable named, which is X in this case, so read will assign "abc" to X. But what happens in the shell afterwards? Where is that specified? Now let's go one step further, and consider combining all of this into a more complex example, where hopefully the answers we get will be consistent with the answers to the previous questions. Consider now a case where a variable that a (not special) built-in utility uses as part of its operation is exported to that utility, and the utility sets that variable (into the shell environment) as a part of its operation. As a first, and particularly absurd, example - and one which is using extensions to the standard, let's consider bash's printf %n format operator. The idea here is to get some general rules for how built-in utilities are supposed to operate in general, not to comment on bash, or %n formats, etc, so please just assume all of that part were to be made standard for this. Consider (with all locale vars initially unset) LANG=en_US.UTF-8 printf $'Hello %s%n \xC3\xAC\n' World LANG Now after printf has output "Hello World" it has written 11 bytes (and characters) to standard output, so the %n format causes LANG=11 to be executed. First question for this example - is the change made to LANG in the shell environment (remember that LANG there is unset, and so not exported) supposed to have any influence at all upon how printf operates, or should it see only the value exported to it in its environment? If the answer is the latter (no changes should be visible inside printf) then there's nothing more to answer. Otherwise, what does printf output for the two bytes given in hex here, one UTF-8 encoded character, or just 2 bytes as we end up by default with LANG=C (as there almost certainly is no language called "11" - let's just assume that is true for the purposes of this message). While that probably makes no difference in this particular example, the bytes output would be the same, consider instead a case where the default shell LANG at the time wasn't unset, so we had (all other locale vars still unset) LANG=en_US.UTF-8 LANG=zh_TW.BIG5 printf $'Hello %s%n \xC3\xAC\n' World LANG When the shell encodes the $'' string, it is using en_US.UTF-8 (I presume, assume these are 2 interactive commands entered at the keyboard, the first is parsed and executed before the 2nd is read) and so that will be a single UTF-8 encoded character (I presume, I really know nothing about char encoding or locales) but when printf comes to output those bytes, they are to either be output as Taiwanese Chinese BIG 5 encoding, or as 2 bytes of (kind of) ascii ? Note here I am not really interested in the locale or encoding aspects, the underlying question is whether when a built-in utility alters the value of a shell variable, should that utility be able to (or required to, or not permitted to) access that variable for its own uses later, and does that differ if a particular value for the variable has been exported into its environment for it to use ? Now all this is leading up to this example ... unset IFS IFS=$' \t\n' read IFS v1 v2 v3 (and for this, ignore what is happening in bug 1778, and concentrate just on the philosophy of what should happen). Take the almost same example input as earlier: : x:y p:q r Now if you're of the opinion that read should not alter any of the variables until field splitting has completed, the answer here is clear, the resulting variables after the read has finished are IFS=: v1=x:y v2=p:q v3=r as the field splitting happens with space in IFS, so that's what is used to split the fields. On the other hand, for the interesting case, let's assume you're one of the "as each field is split out, assign it to its variable" (which several shells implement). In that case, the IFS=: assignment happens before field splitting gets to the next field, and the shells that implement things this way use that value for splitting the remaining fields, leading to IFS=: v1=x v2='y p' v3='q r' as the result. But should they? Not should they assign field by field, that's not the question here, but when field splitting continues, should it be using the value that was passed in the environment for read to use, or should it be digging into the shell to look at variable values that are not exported to it, just because the utility might have changed that value? I have one more very much related example for everyone to consider... pwd; OLDPWD=/foo; OLDPWD=/bar cd /tmp; echo $OLDPWD where shells produce different results, but where in this case I believe the standard (the current draft for Issue 8 anyway) is clear what is required. I believe that the output should be two identical lines, containing the path of whatever the current directory was before this command line was executed (and let's assume that was not /tmp just to avoid complications). And I believe that is what Draft 3 of POSIX Issue 8 requires. Fortunately no shell I tested outputs /bar for the echo (that would clearly be wrong). But many output /foo which makes no sense to me at all, and can only be a bug, there's nothing in the standard I can find which permits that result - the standard says that cd sets OLDPWD to the path to the current working directory before it was changed. That's what the initial "pwd" printed, which is why the two output lines should be the same (and several shells implement it that way). The messing around with the assignments to OLDPWD should be irrelevant, none of that should have any effect upon anything - yet in several shells it does. Amazingly, the (now older) AST version of ksh93 got this correct, but the current (community maintained) version seems to have broken it. (It is unfortunate we cannot play games with setting PWD here, as doing that makes cd's behaviour unspecified - but setting OLDPWD does that only in the case of "cd -" which we're not doing here.) Enough for now - more might follow based upon responses. One last point, I am asking here more about what should happen, rather than what does happen in current shells - we can get to the later question if we can form some common opinion about the desired outcomes here. For shell implementers, and users with a favourite shell, please don't just consider what your own shell does here, unless you particularly carefully pondered this question before, and either implemented what you believe is correct, or picked a shell which had done that - which exception, my guess is, applies to just about no-one (including me). kre