On 04/26/2018 02:17 AM, Assaf Gordon wrote: > Hello, > > Attached an updated patch, hopefully addressing all the issues below. > Initial documentation also included (seperated into several commits to ease > review).
Question - do we want to use 'S::' instead of 'S:' in the optstring? Right now, your patches made -S take a mandatory argument, making: #!/usr/bin/env -S try to parse the script name as a string to be split (and unless the script name has unusual characters, this results in an infloop of treating the script name as the interpreter, for another round of trying to exec the same command line). Although this is probably not what was intended, and the shebang line is already suspicious for not providing more text after -S, making the argument optional and then issuing an error message if optarg is NULL would at least make this not an easy infloop. I'm also trying to think what happens if we want to support platforms where the OS splits strings passed to shebang. (The BSD implementation didn't have to worry quite as much about their code being run on a different OS, like we do). Consider: #!/usr/bin/env -S interpreter 'arg with space' where it already sees "-S", "interpreter", "'arg", "with", "space'", "script", "args..." as separate arguments. If we use 'S:' to getopt_long, then "interpreter" will be subject to -S handling but nothing else will; if we use 'S::', then none of the subsequent arguments will be subject to -S handling (but then we have to revisit whether a NULL optarg would be treated as an error on a shebang line that ends in -S). But either way, it would be nice if we could reconstruct "arg with space" as a single argument to hand to "interpreter", rather than three separate arguments where two of them include a lone "'". I'm wondering if we need yet another magic environment variable for portably marking the demarcation between the arguments to -S and the script name, whether the script is run on a platform that hands -S a single string, or run on a platform that splits arguments, as in: #!/usr/bin/env -S interpreter 'arg with spaces' ${_ENV_END} which on Linux calls "/usr/bin/env" "-S interpreter 'arg with spaces' ${ENV_END}" "script", but elsewhere calls "/usr/bin/env" "-S" "interpreter" "'arg" "with" "spaces" "${_ENV_END}" "script". With the magic marker in place, -S can be used as a toggle mode that says to look if ANY later argument is the magic marker ${ENV_END}, or maybe do this look forward only if optarg for -S contains no spaces, because if a space is present at all, we know the kernel did not split the shebang line. If no spaces are present in optarg, but ${_ENV_END} is present as a later argument, then then attempt to reconstruct the same command line as if all arguments in between had been a single string (so that quote and escape processing is performed on the remaining arguments of the shebang line, but not on the script name or arguments). If ${_ENV_END} is not present, we can't make any assumptions, so we only perform string splitting on optarg, rather than trying to reconstruct a string from a subset of the remaining arguments. Of course, reconstructing a single string can't tell what whitespace the kernel ate in providing multiple arguments, so it will corrupt multiple spaces and/or tabs down to a single space; perhaps the existing \_ escape sequence can be used to overcome the worst effects of that. We'd probably want to document that the expansion of ${_ENV_END} is always empty, even if someone defines that variable in the environment? Another question: Does the BSD implementation have any way to pass empty strings as explicit arguments? The code you posted turns: #!/usr/bin/env -S sh -c '' echo into "sh" "-c" "echo" "script", which did NOT preserve the empty string. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature