This is a new feature which allows the command argument of env to encode multiple extra arguments, as well as the relocation of the first trailing argument among those arguments.
* src/env.c (usage): Mention the existence of the feature. (expand_command_notation): New function. (main): Detect whether the notation is present, based on the first character of command. If so, filter the trailing part of the argument vector through the expand_command_notation function, and use that. Either way, the effective vector is referenced using the down_argv variable and that is used for the execvp call. If an error occurs, the diagnostic refers to the first element of down_argv rather than the original argv. * tests/misc/env.sh: Added some test cases. Doesn't probe all the corner cases. I solemnly declare that I manually tested those corner cases, like "env :" and "env :{}" and such, and used valgrind for all the manual testing to be confident that there are no overruns or uses of uninitialized bytes. * doc/coreutils.texi: Documented feature. Added discussion about how env is often used for the hash bang mechanism, and how the feature relates to this use. --- doc/coreutils.texi | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++ src/env.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- tests/misc/env.sh | 18 +++++++++++++++ 3 files changed, 143 insertions(+), 2 deletions(-) diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 1834e92..9e1cb0c 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -16879,6 +16879,69 @@ env -u EDITOR PATH=/energy -- e=mc2 bar baz @end itemize +Note that the ability to run commands in a modified environment is built into +the shell language, using a very similar @samp{@var{variable}=@var{value}} +syntax; moreover, that syntax allows commands internal to the shell to be run +in a modified environment, which is not possible with the external +@command{env}. Other scripting languages usually also have their own built-in +mechanisms for manipulating the environment around the execution of a child +program. Therefore the external @command{env} executable is rarely needed for +the purpose of running a command in a modified environment. Because the +@command{env} utility uses @env{PATH} to search for @var{command}, it has come +to be mainly used as a mechanism in "hash bang" scripting. In this usage, +scripts are written using the incantation @samp{#!/usr/bin/env interp} where +@var{interp} is the name of some scripting language interpreter. The +@command{env} utility provides value by searching @env{PATH} for the location +of the interpreter executable. This allows the interpreter to be installed in +some chosen location, without that location having to be edited into the hash +bang scripts which refer to that interpreter. + +On some operating systems, the following issue exists: the hash bang +interpreter mechanism allows only one argument. Therefore, if the @command{env} +incantation @samp{#!/usr/bin/env interp} is used, it is not possible to pass an +argument to @samp{interp}, which is a crippling limitation in some +circumstances requiring clumsy workarounds. To overcome this difficulty, the +GNU Coreutils version of @command{env} supports a special notation: +arguments for @var{command} can be embedded in the @var{command} argument +itself as follows. If @var{command} begins with the @samp{:} (colon) +character, then that colon character is removed. The remainder of the +argument is treated as record of colon-separated fields, and split +accordingly. For instance if @var{command} is @samp{:foo:--bar:42}, then +it is split into the fields @samp{foo}, @samp{--bar} and @samp{42}. The +effective command is then just @samp{foo}. The other two fields will be +passed as the first two arguments to @samp{foo}, inserted before the +remaining @var{args}, if @samp{foo} is successfully found using +@env{PATH} and executed. +Furthermore, this special supports one more refinement. +If, after colon splitting, one or more of the fields are +equal to the character string @samp{@{@}} (open brace, closed brace) +then the leftmost such field is replaced with the first of the @var{args} +which follow @var{command}. In this case, that argument is removed from +@var{args}. If @var{args} is empty, then the field is not replaced. + +Example: @command{env} hash bang line for a script executed by the +fictitious @samp{intercal} interpreter. The @samp{--strict-iso} option +is passed to the interpreter, and the @samp{--verbose} option is +passed to the script: + +@example +#!/usr/bin/env :intercal:--strict-iso:@{@}:--verbose +... script goes here ... +@end example + +When the above hash bang script is invoked with the arguments @samp{alpha} and +@samp{omega}, @command{env} is invoked with four arguments arguments: the +argument @samp{:intercal:--strict-iso:@{@}:--verbose}, followed by the +path name to the above script itself, followed by @samp{alpha} and @samp{omega}. +The @command{env} will parse the special notation in the command, producing +the fields @samp{intercal}, @samp{--strict-iso}, @samp{@{@}} and +@samp{--verbose}. The @samp{@{@}} field is recognized and replaced with +the first of the remaining arguments, which is the path to the interpreter. +This argument is then removed form the remaining arguments. Then +@command{env} searches @env{PATH} for @samp{intercal}. Upon finding it, +it executes the interpreter with the arguments @samp{--strict-iso}, +the name of the script, @samp{--verbose}, @samp{alpha} and @samp{omega}. + The program accepts the following options. Also see @ref{Common options}. Options must precede operands. diff --git a/src/env.c b/src/env.c index 63d5c2c..20fafdd 100644 --- a/src/env.c +++ b/src/env.c @@ -18,6 +18,7 @@ #include <config.h> #include <stdio.h> +#include <assert.h> #include <sys/types.h> #include <getopt.h> @@ -70,11 +71,65 @@ Set each NAME to VALUE in the environment and run COMMAND.\n\ \n\ A mere - implies -i. If no COMMAND, print the resulting environment.\n\ "), stdout); + fputs (_("\ +\n\ +COMMAND supports a notation for encoding a command name plus one or more\n\ +arguments. This is useful when env is used in #! (hash bang) scripting.\n\ +Please see the Info documentation for the details.\n\ +"), stdout); emit_ancillary_info (PROGRAM_NAME); } exit (status); } +char ** +expand_command_notation(char **argv) +{ + char *command = xstrdup(argv[0] + 1), *p, **pp, **nargv; + int nf, argc, a, rest = 1; + + for (nf = 1, p = command; *p; p++) + { + if (*p == ':') + nf++; + } + + for (argc = 0, pp = argv; *pp; argc++, pp++) + ; /* empty */ + + argc += nf - 1; + + if ((nargv = malloc((argc + 1) * sizeof *nargv)) == NULL || command == NULL) + die (EXIT_FAILURE, errno, _("out of memory")); + + for (a = 0, p = command; ; p++) + { + char *arg = p; + char *end = p + strcspn(p, ":"); + char ch = *end; + + *end = 0; + + if (rest < 2 && strcmp(arg, "{}") == 0 && argv[rest]) + arg = argv[rest++]; + + nargv[a++] = arg; + + if (ch == ':') + { + p = end; + continue; + } + + break; + } + + assert (a == nf); + + memcpy(&nargv[a], &argv[rest], sizeof nargv[0] * (argc + 2 - rest - a)); + return nargv; +} + int main (int argc, char **argv) { @@ -154,9 +209,14 @@ main (int argc, char **argv) usage (EXIT_CANCELED); } - execvp (argv[optind], &argv[optind]); + char **rest_argv = argv + optind; + char **down_argv = (rest_argv[0][0] == ':') + ? expand_command_notation(rest_argv) + : rest_argv; + + execvp (down_argv[0], down_argv); int exit_status = errno == ENOENT ? EXIT_ENOENT : EXIT_CANNOT_INVOKE; - error (0, errno, "%s", quote (argv[optind])); + error (0, errno, "%s", quote (down_argv[0])); return exit_status; } diff --git a/tests/misc/env.sh b/tests/misc/env.sh index f2f6ba8..aeb2b91 100755 --- a/tests/misc/env.sh +++ b/tests/misc/env.sh @@ -150,4 +150,22 @@ test "x$(sh -c '\c=d echo fail')" = xpass && #dash 0.5.4 fails so check first returns_ 125 env -u a=b true || fail=1 returns_ 125 env -u '' true || fail=1 +# test the special env :command... notation for encoding arguments +test "$(env :echo)" = "" || fail=1 +test "$(env :echo:)" = "" || fail=1 +test "$(env :echo:a)" = "a" || fail=1 +test "$(env :echo:a:b)" = "a b" || fail=1 +test "$(env :echo:a b)" = "a b" || fail=1 +test "$(env :echo:a: b)" = "a b" || fail=1 +test "$(env :echo:a:b c d)" = "a b c d" || fail=1 +test "$(env :echo:aa:bb cc dd)" = "aa bb cc dd" || fail=1 +test "$(env :echo:{}:bb cc dd)" = "cc bb dd" || fail=1 +test "$(env :echo:{} cc dd)" = "cc dd" || fail=1 +test "$(env :echo:{}:aa:bb cc dd)" = "cc aa bb dd" || fail=1 +test "$(env :echo:{})" = "{}" || fail=1 +test "$(env :echo:{}:{})" = "{} {}" || fail=1 +test "$(env :echo:{}:a)" = "{} a" || fail=1 +test "$(env :echo:{}:a b)" = "b a" || fail=1 +test "$(env :echo:{}:{} b)" = "b {}" || fail=1 + Exit $fail -- 2.9.3