Re: getopts appears to not be shifting $@ when consuming options
On 29/01/2021 20:15, Harald van Dijk wrote: I would suggest that if this is changed to conform to POSIX, a non-standard method should remain available to allow shell functions to use getopts internally, including when the getopts loop in the function calls other functions that themselves use getopts, to ensure that any existing scripts broken by the change can be easily updated. One way to achieve that could be to special-case "local OPTIND=1" so that when the function returns, it restores not just the value of OPTIND, but uses that moment to additionally restore the extra internal state. I have found that "local OPTIND=1" already works that way in bash, restoring upon function return not only the value of OPTIND but also the extra internal extra state. It is designed to work like that, not just a happy accident: it is listed as a new feature of 4.4. Because of that, if something like this is implemented, this would probably be the best syntax to use. I have also found that this is easy to implement: currently, getoptsreset() just sets the internal state (shellparam.optind and shellparam.optoff) based on the text value of OPTIND. Instead, if: - var.h defines a new flag to indicate that special hidden data is present past the end of OPTIND's text value, - getopts() builds its own buffer, including the extra hidden data, and calls setvareq() directly with that special flag rather than relying on setvarint(), - getoptsreset() checks it to either set the internal state as before or restore the internal state, depending on whether that special flag is set, everything just works with no special casing needed in the implementation of the "local" command. The net result is even a small reduction in code size. I am only posting a description of the changes rather than a patch or a link to a patch because I implemented this in my shell (a fork of dash), not dash itself, and the changes as I implemented them cannot be applied to dash without some modifications. Cheers, Harald van Dijk
Re: getopts appears to not be shifting $@ when consuming options
On 29/01/2021 20:36, Jilles Tjoelker wrote: Unfortunately, dash and FreeBSD sh reset the getopts state when the positional parameters are modified via set or shift. They probably do this to avoid use after free and out of bounds memory access when a script violates POSIX's rule. In dash, the getopts command guards against this. getoptscmd() checks that shellparam.optind is in bounds before calling getopts(), and getopts() checks that shellparam.optoff is in bounds, so as far as correctness goes, it should be enough to just remove the resetting of shellparam.optind and shellparam.optoff in those places that are not supposed to reset the state. Cheers, Harald van Dijk
Re: getopts appears to not be shifting $@ when consuming options
Hi, On 29/01/2021 18:25, earnestly wrote: In this example dash will repeatedly append 'attr=foo' to the list of parameters in an infinite loop: #!/bin/dash -x while getopts :a: arg -a foo -a bar; do case $arg in a) set -- "$@" attr="$OPTARG" esac done shift "$((OPTIND - 1))" Instead I expected this to result in parameter list containing 'attr=foo' and 'attr=bar'. You are correct in your expectation, I believe: ending the loop after processing both arguments is what your script should do. The reason that dash is behaving the way it does is that dash resets the getopts state when the positional arguments are changed by the set or shift commands. The getopts state is also reset when the positional arguments are changed because of a function call, but restored after the function returns. This is something shared across ash-based shells; you will also find it in FreeBSD's /bin/sh, and if I am not misreading the code, NetBSD's /bin/sh as well. Although there are certainly cases where this behaviour is useful, especially the part where it saves and restores state for function calls, there are also where it is not, such as yours. Additionally, it appears to be in conflict with POSIX, which requires the getopts state to be preserved as long as it continues to be called with the same arguments: only resetting OPTIND to 1 is specified to reset the state to allow arguments to be parsed anew. I would suggest that if this is changed to conform to POSIX, a non-standard method should remain available to allow shell functions to use getopts internally, including when the getopts loop in the function calls other functions that themselves use getopts, to ensure that any existing scripts broken by the change can be easily updated. One way to achieve that could be to special-case "local OPTIND=1" so that when the function returns, it restores not just the value of OPTIND, but uses that moment to additionally restore the extra internal state. Cheers, Harald van Dijk
Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think
On 23/12/2020 20:18, Harald van Dijk wrote: On 21/12/2020 16:24, Jilles Tjoelker wrote: Tradition is for job control shells to be a process group leader, but I don't really know why. Changing this will not fix the issue entirely anyway since the shell must perform tcsetpgrp() from the background when a foreground job has terminated. [...] This should, in my opinion, *only* happen for interactive shells, not for job-control-enabled non-interactive shells. [...] Consider also an extra difficulty arising from this process group switching: : | dash -m When a job-control-enabled shell terminates, or when job control is disabled via set +m, it attempts to re-join its initial process group and set that as the foreground process group. This can fail if the process group has ceased to exist after the shell left it, as it does here, resulting in: $ : | dash -m dash: 0: Cannot set tty process group (No such process) In theory, because of PID reuse, this may even result in some random unrelated process group temporarily becoming the foreground process group. I am leaning towards saying that once the shell has committed to becoming a process group leader, it should remain one. By basing this on the shell being interactive rather than on job control being enabled, this is easier to handle, as as far as POSIX is concerned "interactive" is a property that cannot be changed once the shell has started: set -i and set +i are extensions not required by POSIX, and if they are nonetheless supported it is easy to defend them not being fully equivalent to specifying or leaving out -i on the shell invocation. Cheers, Harald van Dijk
Re: [v3 PATCH 17/17] eval: Add vfork support
On 18/05/2018 19:39, Herbert Xu wrote: This patch adds basic vfork support for the case of a simple command. ... @@ -879,17 +892,30 @@ forkchild(struct job *jp, union node *n, int mode) } } if (!oldlvl && iflag) { - setsignal(SIGINT); - setsignal(SIGQUIT); + if (mode != FORK_BG) { + setsignal(SIGINT); + setsignal(SIGQUIT); + } setsignal(SIGTERM); } + + if (lvforked) + return; + for (jp = curjob; jp; jp = jp->prev_job) freejob(jp); } This leaves SIGQUIT ignored in background jobs in interactive shells. ENV= dash -ic 'dash -c "kill -QUIT \$\$; echo huh" & wait' As of dash 0.5.11, this prints "huh". Before, the subprocess process killed itself before it could print anything. Other shells do not leave SIGQUIT ignored. (In a few other shells, this also prints "huh", but in those other shells, that is because the inner shell chooses to ignore SIGQUIT, not because the outer shell leaves it ignored.) Cheers, Harald van Dijk
Re: [PATCH] jobs: Block signals during tcsetpgrp
On 06/01/2021 04:45, Herbert Xu wrote: This patch implements the blocking of SIGTTOU (and everything else) while we call tcsetpgrp. Reported-by: Steffen Nurpmeso Signed-off-by: Herbert Xu diff --git a/src/jobs.c b/src/jobs.c index 516786f..809f37c 100644 --- a/src/jobs.c +++ b/src/jobs.c @@ -1512,7 +1512,13 @@ showpipe(struct job *jp, struct output *out) STATIC void xtcsetpgrp(int fd, pid_t pgrp) { - if (tcsetpgrp(fd, pgrp)) + int err; + + sigblockall(NULL); + err = tcsetpgrp(fd, pgrp); + sigclearmask(); + + if (err) sh_error("Cannot set tty process group (%s)", strerror(errno)); } #endif While this is a step in the right direction, Jilles has already replied with an explanation of why this is not enough: if the terminal is in TOSTOP mode, it's not just tcsetpgrp() that needs to be handled, it's any write as well that may occur while the shell is not in the foreground process group. While it may be working according to design for messages written when the shell is not supposed to be in the foreground process group, it is another story when the shell is both responsible for taking itself out of the foreground process group and for writing a message. This is made worse by the fact that there is no synchronisation with child processes on errors, so even forcibly restoring the foreground process group may not be enough: unfortunate scheduling may result in a child process immediately setting the foreground process group to the child process after the parent process attempted to restore it to itself. I have not yet seen a good solution for this. Cheers, Harald van Dijk
Re: set -I is not required by standard, and does not match bash
On 04/01/2021 12:22, Denys Vlasenko wrote: Hello, In dash, set -I is a short-option alias to set -o ignoreeof. However, bash does not have such alias, it has an undocumented set -I which switches off "invisible variables" (I don't know what that is). bash does not have any set -I. bash *had* a set -I, but dropped it in version 5.1, so there is no longer a compatibility issue to worry about. Cheers, Harald van Dijk
Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think
On 21/12/2020 16:24, Jilles Tjoelker wrote: Also, simply entering the command trap "echo TTOU" TTOU in an interactive shell makes it behave strangely. Created jobs immediately stop, "fg" works once but after that the shell tends to get stuck quickly. Good catch, there is some work to be done there too. This seems a good approach, but certain writes to the terminal may need the same treatment, for example the error message when fork fails for the second and further elements of a pipeline. This also makes me wonder why SIGTTOU is ignored at all by default. True. This is hard to create a reliable test case for, but there is only limited code that can get executed while a job-control-enabled shell may not be in the foreground process group. If fork fails halfway through a pipeline, it may also be a problem that some of the commands in the pipeline may still run. In mksh, the issue is resolved slightly differently: setting a trap on TTOU pretends to work but the signal disposition remains set to ignored. zsh also rejects traps on TTOU, but does so explicitly: zsh: can't trap SIGTTOU in interactive shells Amusingly, it prints this in any shell where job control is enabled, regardless of whether the shell is interactive. Tradition is for job control shells to be a process group leader, but I don't really know why. Changing this will not fix the issue entirely anyway since the shell must perform tcsetpgrp() from the background when a foreground job has terminated. I've been thinking more about this, and I suspect it's a another conflation between interactive mode and job control. In an interactive shell that's not executing any external program, you want any Ctrl-C to only send SIGINT to that shell, not to any other process. In order for that to work, it needs to be its own process group. This should, in my opinion, *only* happen for interactive shells, not for job-control-enabled non-interactive shells. Consider sh -c 'sh -mc "while :; do :; done"; echo bye' The behaviour I would want is that Ctrl-C kills the parent shell, so that this does not print "bye". Different shells behave differently. What is definitely required, though, is that the shell not read input or alter terminal settings if it is started in the background (possibly unless the script explicitly ignored SIGTTOU). This is what the loop with tcgetpgrp() does. Definitely. Cheers, Harald van Dijk
Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think
On 19/12/2020 22:21, Steffen Nurpmeso wrote: Steffen Nurpmeso wrote in <20201219172838.1b-wb%stef...@sdaoden.eu>: |Long story short, after falsely accusing BSD make of not working After dinner i shortened it a bit more, and attach it again, ok? It is terrible, but now less redundant than before. Sorry for being so terse, that problem crosses my head for about a week, and i was totally mislead and if you bang your head against the wall so many hours bugs or misbehaviours in a handful of other programs is not the expected outcome. I think a minimal test case is simply all: $(SHELL) -c 'trap "echo TTOU" TTOU; set -m; echo all good' unless I accidentally oversimplified. The SIGTTOU is caused by setjobctl's xtcsetpgrp(fd, pgrp) call to make its newly started process group the foreground process group when job control is enabled, where xtcsetpgrp is a wrapper for tcsetpgrp. (That's in dash, the other variants may have some small differences.) tcsetpgrp has this little bit in its specification: Attempts to use tcsetpgrp() from a process which is a member of a background process group on a fildes associated with its con‐ trolling terminal shall cause the process group to be sent a SIGTTOU signal. If the calling thread is blocking SIGTTOU sig‐ nals or the process is ignoring SIGTTOU signals, the process shall be allowed to perform the operation, and no signal is sent. Ordinarily, when job control is enabled, SIGTTOU is ignored. However, when a trap action is specified for SIGTTOU, the signal is not ignored, and there is no blocking in place either, so the tcsetpgrp() call is not allowed. The lowest impact change to make here, the one that otherwise preserves the existing shell behaviour, is to block signals before calling tcsetpgrp and unblocking them afterwards. This ensures SIGTTOU does not get raised here, but also ensures that if SIGTTOU is sent to the shell for another reason, there is no window where it gets silently ignored. Another way to fix this is by not trying to make the shell start a new process group, or at least not make it the foreground process group. Most other shells appear to not try to do this. Cheers, Harald van Dijk
Re: Changes to job handling cause hangs in wait
On 01/12/2020 10:56, Herbert Xu wrote: On Tue, Dec 01, 2020 at 10:55:06AM +, Harald van Dijk wrote: You wrote: "So the problem is really in the parent of this shell, which appears to be bash:" You should read my follow-up email too that suggested changing the systemd script. I do not appreciate your false accusation that I did not read your follow-up email. You suggested that as an alternative, without retracting your claim that this is a problem in bash. Cheers, Harald van Dijk
Re: Changes to job handling cause hangs in wait
On 01/12/2020 10:53, Herbert Xu wrote: On Tue, Dec 01, 2020 at 10:50:19AM +, Harald van Dijk wrote: This used to exit immediately, leaving the sleep process running in the background without waiting for it. On the dash that's currently provided by Ubuntu, based on 0.5.10.2, it still behaves that way. With current dash from Git, it does not. This is clearly a change in behaviour in dash not in response to any bug (real or not) in bash. I'm not suggesting it's a bug in bash. You wrote: "So the problem is really in the parent of this shell, which appears to be bash:" Cheers, Harald van Dijk
Re: Changes to job handling cause hangs in wait
On 01/12/2020 10:34, Herbert Xu wrote: On Tue, Dec 01, 2020 at 10:14:04AM +, Harald van Dijk wrote: POSIX says: "If the wait utility is invoked with no operands, it shall wait until all process IDs known to the invoking shell have terminated and exit with a zero exit status." I would say that child processes that were created before dash was started do not have process IDs known to dash. Well I disagree Then dash still has a bug regardless: bash -c 'dash -c "wait" <(sleep 1d)' Here, dash does not wait for the child process. The sleep process is either known to dash, in which case dash must wait in both examples, or not known to dash, in which case dash must wait in neither example. and the fact that the original ksh does the same thing is an important fact. The original ksh is well-known not to conform to the current POSIX requirements, in some cases because POSIX changed requirements, in some cases because ksh's behaviour was clearly buggy and not worth standardising. bash -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true; . ~/.profile >/dev/null 2>&1 || true; buildtree="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree"; mkdir -p -m 1777 -- "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=2; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT PIPE; cd "$buildtree"; export AUTOPKGTEST_NORMAL_USER=; export ADT_NORMAL_USER=; chmod +x /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated; touch /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr; /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated 2> >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr >&2) > >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout); For some reason this is causing the final two tee's to be created as children of debian/tests/timedated rather than the bash shell. This is because of the same optimisation that dash also has, where it tries to avoid creating a subshell for the last command in a list when it can just exec() without a fork() instead. A minimal example without an explicit exec is bash -c 'dash -c ": & wait" <(sleep 1d)' I'm not sure about that because bash itself is still hanging around, if it were really the -c optimisation then bash should not appear in the ps output at all. Good point, it's a bit more complicated. Can take a more in-depth look at that example later. Here's a simpler example without bash at all, then, where it is far clearer that the -c optimisation is responsible: dash -c 'sleep 1d & dash -c ": & wait"' This used to exit immediately, leaving the sleep process running in the background without waiting for it. On the dash that's currently provided by Ubuntu, based on 0.5.10.2, it still behaves that way. With current dash from Git, it does not. This is clearly a change in behaviour in dash not in response to any bug (real or not) in bash. Cheers, Harald van Dijk
Re: Changes to job handling cause hangs in wait
On 01/12/2020 06:06, Herbert Xu wrote: On Tue, Dec 01, 2020 at 04:42:03PM +1100, Herbert Xu wrote: Nevermind, I see that the script has been modified to use bash. I can reproduce the problem now so it's all good. OK the problem is this: sh -c 'sleep 1d& exec $MYSHELL -c "sleep 1& wait"' You can replace MYSHELL with whatever shell you want to use. Essentially dash will now wait for all children, even ones that were created prior to its existence, however, bash only waits for children that it created directly. FWIW ksh exhibits the same behaviour as dash and I think there is nothing wrong with this. POSIX says: "If the wait utility is invoked with no operands, it shall wait until all process IDs known to the invoking shell have terminated and exit with a zero exit status." I would say that child processes that were created before dash was started do not have process IDs known to dash. So the problem is really in the parent of this shell, which appears to be bash: bash -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true; . ~/.profile >/dev/null 2>&1 || true; buildtree="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree"; mkdir -p -m 1777 -- "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=2; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT PIPE; cd "$buildtree"; export AUTOPKGTEST_NORMAL_USER=; export ADT_NORMAL_USER=; chmod +x /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated; touch /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr; /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated 2> >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr >&2) > >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout); For some reason this is causing the final two tee's to be created as children of debian/tests/timedated rather than the bash shell. This is because of the same optimisation that dash also has, where it tries to avoid creating a subshell for the last command in a list when it can just exec() without a fork() instead. A minimal example without an explicit exec is bash -c 'dash -c ": & wait" <(sleep 1d)' Cheers, Harald van Dijk
Re: [bug] dash eats one line
Hi, On 27/11/2020 12:16, Pedro wrote: Hi, I am using dash on debian 10 buster (stable release) which according to `dpkg -S dash` is `Version: 0.5.10.2-5` evilham encountered a bug where dash eats one line but not in freebsd's shell nor bash attached you can find an example to reproduce it execution of the example in my computer (enableUFWRules is not in new line): ./dash-bug.sh # Enable UFW rules only if requested (it was)enableUFWRules $ bash dash-bug.sh # Enable UFW rules only if requested (it was) enableUFWRules This looks like <https://www.mail-archive.com/dash@vger.kernel.org/msg01883.html>, which has been fixed in dash 0.5.11. I can confirm that your script produces the expected output with dash 0.5.11 as well as dash built from current Git. Cheers, Harald van Dijk
Re: [PATCH] Don't include config.h when building helpers using the native compiler
On 22/06/2020 06:25, Fabrice Fontaine wrote: config.h contains settings for the cross compiler (most importantly 32/64bit versions of functions), so don't include it when calling the native compiler to build the helpers. Otherwise we get build errors like: /usr/bin/gcc -include ../config.h -DBSD=1 -DSHELL -DIFS_BROKEN -g -O2 -Wall -o mkinit mkinit.c In file included from /usr/include/sys/stat.h:107, from /usr/include/fcntl.h:38, from mkinit.c:50: /usr/include/bits/stat.h:117: error: redefinition of ‘struct stat’ In file included from /usr/include/fcntl.h:38, from mkinit.c:50: /usr/include/sys/stat.h:504: error: redefinition of ‘stat’ /usr/include/sys/stat.h:455: note: previous definition of ‘stat’ was here Signed-off-by: Peter Korsgaard [Retrieved from: https://git.buildroot.net/buildroot/tree/package/dash/0001-no-config.h-for-helpers.patch] Signed-off-by: Fabrice Fontaine A better version of this patch has already been submitted and accepted in 2018: <https://www.spinics.net/lists/dash/msg01629.html>. Cheers, Harald van Dijk
Re: Improving xtrace output from subshells
On 29/04/2019 23:55, Harald van Dijk wrote: On 29/04/2019 15:44, Vadim Zeitlin wrote: On Sat, 27 Apr 2019 22:58:23 +0100 Harald van Dijk wrote: HvD> It wastes memory: the code should be re-worked so that output and HvD> preverrout share the same buffer, given that there can never be pending HvD> output for one when the other is being written to. I'm a bit worried about this one, it doesn't seem completely obvious to me that the 2 objects can't be used at the same time. It's indeed not completely obvious, but all bits that print to output will flush the buffer when they're done, before a next command starts executing, before the next command requires an xtrace string to be printed. It's usually the flushall() in evalbltin() that handles this. I have managed to verify that using a single buffer for all streams (output, errout and preverrout) works well and simplifies logic so that the shell with that change is actually smaller than the one without it. I do not know whether there would be interest in getting that in dash. Cheers, Harald van Dijk
Re: Bug in Dash's unquoting of backslashes within backquoted strings
On 21/05/2020 22:38, Ron Yorston wrote: Matt Whitlock wrote: A minimal example: : `: " \\$(bug)"` However, when it appears inside a backquoted subcommand (with the backslash characters being appropriately escaped), such as given at the top of this report, then Dash processes it incorrectly: /bin/sh: 1: bug: not found This seems to have been introduced by commit 6bbc71d (parser: use pgetc_eatbnl() in more places). Reverting the following part of the commit makes the problem go away: Yes, that commit (a patch I had submitted) was buggy and the part that you suggest reverting should be. Cheers, Harald van Dijk
Re: [v2 PATCH] parser: Catch errors in expandstr
On 28/04/2020 07:17, Herbert Xu wrote: ---8<--- On Fri, Dec 13, 2019 at 02:51:34PM +, Simon Ser wrote: Just noticed another dash bug: when setting invalid PS1 values dash enters an infinite loop. For instance, setting PS1='$(' makes dash print many of these: dash: 1: Syntax error: end of file unexpected (expecting ")") It would be nice to fallback to the default PS1 value on error. This patch fixes it by using the literal value of PS1 should an error occur during expansion. On Wed, Feb 26, 2020 at 09:12:04PM +, Ron Yorston wrote: There's another case that should be handled. PS1='`xxx(`' causes the shell to exit because the old-style backquote leaves an additional file on the stack. Ron's change has been folded into this patch. This still does not restore the state completely. It does not clean up any pending heredocs. I see: $ PS1='$(< That is, after entering the ':' command, the shell is still trying to read the heredoc from the prompt. I have not looked in detail to see if anything else is not getting cleaned up that should be. Cheers, Harald van Dijk
Re: [PATCH] build: delete AC_PROG_YACC
On 10/10/2019 17:42, Fangrui Song wrote: src/arith.y was deleted by commit f6e3b2 [ARITH] Add assignment and intmax_t support. We can delete AC_PROG_YACC to get rid of the build-time dependency. Just a minor point of clarification: there is no build-time dependency on bison/yacc. The configure script checks whether bison/yacc exists, but if not, happily continues. This change does not get rid of a build-time dependency, it cleans up the configure script by removing an unused check. Which is obviously still an improvement so your patch should still go in. Cheers, Harald van Dijk
Re: Improving xtrace output from subshells
On 27/04/2019 15:05, Vadim Zeitlin wrote: On Sat, 27 Apr 2019 01:14:29 +0100 Harald van Dijk wrote: HvD> I don't believe there is any requirement for your expectation and HvD> several other shells share this behaviour, but agreed that it would be a HvD> nice improvement to avoid this. Yes, this is certainly just an expectation and not something guaranteed by POSIX, but it looks like a very reasonable expectation to me. And, speaking of the other shells, all of the ones I tested (bash, ksh, zsh) do keep the lines together. bosh and yash behave the same as dash. HvD> Please keep in mind that even without this, you would not have HvD> deterministic output. The commands in a pipeline could be output in any HvD> order, even if each command would be output on its own line. In theory, it seems like you're right and it should be possible, but in practice it just doesn't seem to ever happen, i.e. the output is always in the same left-to-right order. Maybe there is something that implicitly serializing the output from the pipeline components either in the shell or in the kernel? I admit I haven't dived into this deep enough to really find out, the fact that the order remains always the same in practice is good enough for me. It seems to mostly work that way in bash, and I have no dug into the reason for that either. You tested ksh and zsh as well. In ksh and zsh, I do commonly see variations in the order in which the lines are printed, and with patched dash, I see the same variations. Rarely, I see out-of-order prints even in bash, so it does seem like it's just a coincidence as a result of the way of bash is written, not something that bash actively attempts to guarantee. Try it: c=0 while : do xtrace=$(bash -o posix -xc ': a | : b' 2>&1) case $xtrace in *a*b*) : $((c+=1)) ;; *) echo $c echo "$xtrace" break ;; esac done It takes hundreds or sometimes even thousands of runs, but eventually, I do see + : b followed by + : a. HvD> As a proof of concept that should clearly not be committed in its HvD> current form, please see the attached patch. Thanks, this is indeed much simpler than what I had in mind and I can confirm that it does work. What prevents this patch from being applied in the current form exactly? I removed a #ifdef FLUSHERR which suggests to me this is supposed to be optional, not unconditional. The macro OUTBUFSIZ should be renamed if it is no longer just the size of the "output" buffer. I only touched the case where USE_GLIBC_STDIO is undefined. (That said, the case where USE_GLIBC_STDIO is defined is already broken, so this may be okay.) It wastes memory: the code should be re-worked so that output and preverrout share the same buffer, given that there can never be pending output for one when the other is being written to. (This may apply to errout as well, if someone wants to enable buffering for that.) Cheers, Harald van Dijk
Re: Improving xtrace output from subshells
On 25/04/2019 16:55, Vadim Zeitlin wrote: Hello, dash currently has a problem with output from "-x" option being non-deterministic when subshells are used (e.g. whenever pipelines are). This was reported a long time ago as a Debian bug[1] but AFAICS has never been posted here, so at the very least I'd like to do it and, ideally, also fix it. But first let me describe the bug in more details: When you run the following command: $ dash -xc 'z=`echo | cut -f1 | sort`' the following output would be expected: + echo + cut -f1 + sort + z= And, indeed, this is what you usually get. However the output is non-deterministic and you can also get weirder things, with the output from different subshells intermixed together. A simple way to see it is to run the following loop: $ while true; do dash -xc 'z=`echo | cut -f1 | sort`' 2>&1 | grep cut; done | fgrep -vx '+ cut -f1' which is not supposed to output anything, but in practice it does quickly give some output, e.g. here is what I see when running it with the latest dash version from git (48875c1201930d1e71d358eb1cf3eacc166795be): cut -f1 + + cut -f1 + cut+ -f1 + cut+ -f1 + cut+ -f1 cut -f1 + + echocut cut -f1 cut -f1 + + cutsort + cut -f1+ I don't believe there is any requirement for your expectation and several other shells share this behaviour, but agreed that it would be a nice improvement to avoid this. In practice, this is very annoying as it makes it difficult to compare the logs of the same script run using "-x" as for any non-trivial amount of output there will always be spurious differences between the output produced by the different runs. As the comment from Tomi Ollila in the Debian bug report says, the problem is due to using several different write() calls for output. Looking at the sources, this happens inside "if (xflag)" block in evalcommand() function starting at the line 854 in eval.c, where outstr(), eprintlist() twice, and outcslow() are used for the single line of output. Please keep in mind that even without this, you would not have deterministic output. The commands in a pipeline could be output in any order, even if each command would be output on its own line. To fix the problem we need to use only a single call instead and I see 2 possibilities here: first the obvious one which would be to output everything in a temporary memory buffer and output the entire buffer at once to preverrout at the end. This approach has some problems, however: 1. It would consume some extra memory and while this should be negligible in practice, it's arguably still better to avoid it. 2. Support for memory output is disabled using "#ifdef notyet" in the current sources and enabling it would enable quite a few other things, so I'm not sure if it's really a good idea. 3. Without enabling memory output, the code would need to be changed relatively extensively and I'd need to either refactor mkinit.c to extract "struct text" and the related addstr()/addchar() functions from it and make it possible to reuse it from eval.c, or reimplement something quite close to it, which is not appealing. The #ifdef notyet is more than is needed for this. The output mechanism already supports buffering. It's easy to adapt this to work for xtrace output as well. You'd only run into issues if you have a command string that's larger than the buffer size, but I think that should be considered acceptable. As a proof of concept that should clearly not be committed in its current form, please see the attached patch. The other (and the only other, AFAICS) possibility would be to use writev() to output everything at once instead. This avoids (1) (but, again, I don't think it's really worth avoiding it that much), but would also require making rather extensive changes to the code. I'm also not sure if writev() is supported on all platforms targeted by dash, but hopefully it should be, in 2019. I'd appreciate your advice about what kind of approach would be the most appropriate and most likely to be accepted. Personally, I think of enabling support for memory-backed output structs (only, i.e. without touching all the rest of the code guarded by "#ifdef notyet") and using it here, but if this is undesirable, for whatever reason, I'd go with writev() because as long as we need to change the code structure anyhow, it's not really more complicated than using a temporary buffer. Thanks in advance! VZ [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567648 --- a/src/eval.c +++ b/src/eval.c @@ -861,9 +861,7 @@ bail: sep = eprintlist(out, varlist.list, sep); eprintlist(out, osp, sep); outcslow('\n', out); -#ifdef FLUSHERR flushout(out); -#endif } /* Now locate the command. */ --- a/src/output.c +++ b/src/output.c @@ -88,7 +88,9 @@ struct output output
Re: POSIX compliance issues with case statements
Hi, On 17/04/2019 17:29, Drew DeVault wrote: Unless I'm misinterpreting the specification, it seems like dash doesn't handle pattern matching in case statements correctly. The following sample demonstrates the issue: #!/bin/sh -eu while read -r line This will strip leading and trailing whitespace. To prevent that, you need to set IFS to an empty string first. do case "$line" in [:space:]*:) This will match on a first character that is ':', 's', 'p', 'a', 'c', 'e', or ':' (again), not a first character that is a space. You want [[:space:]]*: here. echo "a" ;; *:) echo "b" ;; *) echo "c" ;; esac done <https://mirror.sr.ht/alpine/sr.ht/ EOF The expected output is b a c c, but the output in practice is b b c c. Quoting the spec: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05 ...case shall execute the compound-list corresponding to the first one of several patterns (see Pattern Matching Notation)... http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 ...patterns matching a single character shall match a single character: ordinary characters, special pattern characters, and pattern bracket expressions... ...an open bracket... shall introduce a pattern bracket expression... as in XBD RE Bracket Expression... http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 ...The following character class expressions shall be supported in all locales: [:space:] Character class expressions cannot be used by themselves, they can only be part of bracket expressions, which is why you need the double [[ and ]]. Am I misinterpreting the spec or is this indeed a problem with dash? In addition to this issue, the following example cause a parsing error: [ \t]) Both problems can be reproduced with bash. [ \t] does not work because the space ends the word. Additionally, \t does not mean "tab", it just means a literal "t". You could use [\ \ ] instead, or [" "], where in both cases the second blank is a literal tab. Cheers, Harald van Dijk
Re: [v2 PATCH] eval: Reset handler when entering a subshell
On 03/03/2019 17:39, Jilles Tjoelker wrote: On Sun, Mar 03, 2019 at 04:43:09PM +, Harald van Dijk wrote: The effect of the command built-in is that "if the command_name is the same as the name of one of the special built-in utilities, the special properties in the enumerated list at the beginning of Special Built-In Utilities shall not occur." This is ambiguous as to whether it is just about the special properties associated with the . command, or whether it includes those associated with the set command called indirectly, but it must be one or the other. If it includes the set command, then the shell shall not exit, and the correct output is a followed by b. If it does not include the set command, then the shell shall exit, and the correct output is nothing. dash outputs b, which is wrong in either case. The way I interpret this is that any error while parsing or executing the . or eval command(s) is a "special built-in utility error" which can be "caught" using "command". Therefore, the correct output is an error message about the invalid option (to stderr) followed by b. This is what happens in FreeBSD sh (for quite a few years), ksh93 (20120801_2) and mksh (56c). That's how dash implements it too, but it plainly doesn't make sense based on POSIX's requirements. If the effect of "shall exit" does not occur, the result isn't "shall start exiting and then abort the exit", it's "shall not exit". In order to allow dash/FreeBSD sh's interpretation, POSIX would need to say something like "the special properties [...] shall be suppressed" rather than "[...] shall not occur". Cheers, Harald van Dijk
Re: [v2 PATCH] eval: Reset handler when entering a subshell
On 03/03/2019 13:57, Herbert Xu wrote: On Sat, Mar 02, 2019 at 01:28:44PM +, Harald van Dijk wrote: That is *a* real bug here, not *the* real bug. This leaves the buggy behaviour of the "command" built-in intact in the test case I included in the message you replied to. I don't quite understand. Could you explain what is still buggy after the following patch? I gave this example in my previous message: command . /dev/stdin <Ordinarily, "set -o invalid" is a "special built-in utility error", for which the consequence is that a non-interactive shell "shall exit". If it weren't a special built-in utility, it would be an "other utility (not a special built-in) error", for which a non-interactive shell "shall not exit". The effect of the command built-in is that "if the command_name is the same as the name of one of the special built-in utilities, the special properties in the enumerated list at the beginning of Special Built-In Utilities shall not occur." This is ambiguous as to whether it is just about the special properties associated with the . command, or whether it includes those associated with the set command called indirectly, but it must be one or the other. If it includes the set command, then the shell shall not exit, and the correct output is a followed by b. If it does not include the set command, then the shell shall exit, and the correct output is nothing. dash outputs b, which is wrong in either case. Cheers, Harald van Dijk
Re: [PATCH] shell: Reset handler when entering a subshell
On 28/02/2019 06:27, Herbert Xu wrote: Thanks for the analysis. I think the real bug here is the fact that subshells can "escape" their jail by a longjmp that is caught by evalcommand. That is *a* real bug here, not *the* real bug. This leaves the buggy behaviour of the "command" built-in intact in the test case I included in the message you replied to. For fixing the one bug, it is a sensible approach, but keep in mind that when a fork is omitted because of EV_EXIT, so too will the reset of handler. You may be able to get away with that as long as you do not propagate EV_EXIT in any cases where a cleanup action might cause problems. Cheers, Harald van Dijk
Re: [PATCH v3] redir: Handle nested exec within REALLY_CLOSED redirection
On 16/01/2019 21:53, Martijn Dekker wrote: Op 16-01-19 om 22:39 schreef Martijn Dekker: Op 16-01-19 om 14:32 schreef Herbert Xu: Thanks for the patch. I took a deeper look into the history of the bug and it turned out that I added REALLY_CLOSED as an optimisation in order to avoid an unnecessary close(2) syscall. Does this actually save cycles? I'm probably missing something, but that code looks to me like it probably uses more cycles than close(2) going 'descriptor not open, return'. Never mind, stupid question that I should have googled before asking it. The answer is that a switch to kernel mode and back is expensive. Still, if that's slow enough and happens commonly enough that it's worth avoiding, it seems like it would still be simpler, shorter and probably faster to just write if (fd != -1) close(fd); in a few places, since we know that the only invalid fd that ever gets passed to close() is -1. That avoids the other cases where dash already happily calls close(-1) as well. Cheers, Harald van Dijk
Re: [BUG] dash hangs on 'eval' syntax error in dot script
On 10/01/2019 14:21, Martijn Dekker wrote: Op 10-01-19 om 11:37 schreef Martijn Dekker: In a dot script sourced with 'command .' (which is useful to avoid exiting if the script doesn't exist), triggering a syntax error in an 'eval' in a subshell causes dash to hang at the end of the main script. In fact, 'eval' doesn't appear related. I can also trigger the bug by triggering an error in another special builtin: command . /dev/stdin < I do not see a hang myself, but I definitely see wrong behaviour: the output I get is $ command . /dev/stdin < ( set -o foobar ) && echo WOOPS > EOF src/dash: 1: set: Illegal option -o foobar $ echo end end $ WOOPS $ This was introduced by commit 46d5a7fcea81b489819f753451c1ad2fe435f148 Author: Herbert Xu Date: Tue Mar 27 00:39:35 2018 +0800 eval: Restore input files in evalcommand When evalcommand invokes a command that modifies parsefile and then bails out without popping the file, we need to ensure the input file is restored so that the shell can continue to execute. What I think it going on here, although I do not understand it completely yet, is: evalcommand invokes a command that modifies parsefile: that's the dot command. The evaluation of the dotted file involves creating a subshell, and because of the invalid option, that subshell exits with EXERROR. Because the subshell is invoked from within the command builtin, that EXERROR is eaten, and the input file is restored. The subshell then happily continues reading commands at the same time as the parent shell. Although this particular example used to work before that specific commit, I suspect the underlying problem had been there already and other examples could expose it just as well. Fundamentally, I think an important question to ask is whether command . /dev/stdin <is supposed to abort the shell because of the invalid option, and if not, whether it should print a and b, or only b. Personally, I think there is nothing in POSIX to suggest that only b gets printed: either the special property of special built-in utilities that they cause the shell to exit on error apply, in which case nothing gets printed, or they do not apply, in which both a and b get printed. Nonetheless, printing b is the most popular result: bash/yash/zsh print nothing. dash/ksh/mksh/pdksh/posh print b. bosh prints a and b. My personal feeling is that "print nothing" is the correct result: the command built-in should only cause the immediately invoked command to not be treated as a special built-in utility, not anything indirectly invoked by that. Cheers, Harald van Dijk
Re: [BUG] 'fc -s' infinite loop
On 01/01/2019 16:24, Harald van Dijk wrote: On 01/01/2019 14:27, Martijn Dekker wrote: On dash and on gwsh, the 'fc -s' command, that re-executes a command from the history, executes the command repeatedly in an infinite loop. It should execute it just once. This is caused by the block with the XXX comment in src/histedit.c: if (displayhist && hist) { /* * XXX what about recursive and * relative histnums. */ history(hist, , H_ENTER, s); } This changes the position in the history list and also clobbers he, so the if (he.num == last) check afterwards always returns false, and if the check was supposed to return false, the next history(hist, , direction) would not do what was intended. The immediate problem can be fixed either by dropping support for the non-standard and undocumented fc -s first last and immediately breaking out of the loop, or by restoring the position (restoring he at the same time). Keeping fc -s first last working is pretty much not an option without modifying it to generate a full list of commands to execute first. Otherwise, the comment /* * At end? (if we were to lose last, we'd sure be * messed up). */ is sure to hit: executing the new commands may clear out previous history entries, including last. fc -s first does not have that problem since it only takes a single command -- except that it should enter the command into the history before executing it, not after it, at which point you can get that problem after all. However... But there are more problems: H_ENTER adds a new command to the history. bash and ksh do not do this: echo hello fc -s 1 fc -l will show (bash, but ksh is similar) hello 1 echo hello 2 echo hello That is, the echo hello command appears twice in the history, but the fc -s 1 command does not appear at all. This is what POSIX probably[*] requires as well. ...modifying fc to use H_REPLACE nicely avoids that problem. Since no new history entry is created, no old entry can be cleared out. By the time fc is called, the fc line is already in the history, so to get the bash/ksh behaviour, the current command would need to be overwritten. The documentation of libedit does not show any option which can overwrite entries already created. It documents H_ADD and H_APPEND, which can append, and has an undocumented H_REPLACE used by the equally undocumented replace_history_entry() function in . I do not know yet if it makes sense to start using this. Having thought about it, I think it does make sense, and have looked at how to use H_REPLACE. It takes a char * and a void *, where the char * specifies the new string for the currently selected history entry, and the void * is application-specific data. Since no application-specific data is ever used, it can simply be hardcoded to (void *) NULL, making sure to use a cast because history() is a variadic function. Since H_REPLACE operates on the currently selected history entry, it requires H_FIRST to re-select the most-recent history entry. A corner case is when one command line consists of multiple fc commands: echo abc echo def fc -s 1; fc -s 2 Here, bash and ksh let the last fc -s command "win": after this, the history is echo abc echo def echo def but the output is abc def abc def Using H_REPLACE replicates that. Also, when neither -s nor -l is specified and an editor is invoked, the new command is supposed to be entered into the history as well. That too is not implemented. This can be implemented by modifying the parser to allow creating history entries for files pushed back later. Currently, the check is limited to stdin: if (parsefile->fd == 0 && hist && something) but keeping track of whether the history should be saved for each file, and then turning it on when fc without -s/-l is used, lets this work. Other bugs: - fc without arguments is perfectly valid and should not raise an error. - The extension to allow negative numeric arguments without -- should work on BSD but does not work on glibc, because optind is set to 0 rather than 1. - Commands containing blank lines are entered into the history without the blank lines. Entirely blank commands should not be entered in the history, but echo "hello world" should not enter the history as echo "hello world" since that has a different effect. - fc -s's old=new option should not be accepted and ignored when -s is not specified. old=new is supposed to be interpreted as a search string in that case. - When an out-of-range numbe
Re: [Bug report] Incorrect handling of backslashes in read -r
On 04/01/2019 13:20, Peter Wu wrote: Hi, Dash 'read' builtin with the '-r' option is not POSIX-compliant: $ printf 'omg\\bar\\x41--\n' > input $ dash -c 'read -r x < input; cat input; echo "$x"' omg\bar\x41-\\- omar\x41-\- The outputs are expected to be equal (which is the case with bash). Originally found in dash 0.5.8-2.10 (Ubuntu 18.04), but also reproduced with dash 0.5.10.2-1 (Arch Linux). This isn't read giving the backslashes special treatment, it's echo. Use printf "%s\n" "$x" instead and you will get the same output. Cheers, Harald van Dijk
Re: [BUG] 'fc -s' infinite loop
On 01/01/2019 18:59, Martijn Dekker wrote: Op 01-01-19 om 17:24 schreef Harald van Dijk: The immediate problem can be fixed either by dropping support for the non-standard and undocumented fc -s [...] Actually, it's standard: I didn't say fc -s was non-standard, I said fc -s first last was non-standard. POSIX does not allow specifying last in combination with -s. Cheers, Harald van Dijk
Re: [v2 PATCH] eval: Only restore exit status on exit/return
On 14/12/2018 10:07, Harald van Dijk wrote: On 14/12/2018 09:47, Herbert Xu wrote: On Fri, Dec 14, 2018 at 09:37:09AM +, Harald van Dijk wrote: Still, that works as well, doesn't it? The control flow is basically the same so the logic in my other message applies. returncmd() doesn't use exceptions, so you get to dotrap() which already resets exitstatus if necessary. returncmd itself doesn't jump up but it may be part of a subshell which would execute a jump to the top as part of the EV_EXIT optimisation. That's where you'd need to restore the exit status. Ah, okay, so for something like trap 'false; (return); echo $?' EXIT you want to remember that it's executed as part of a trap action and print 0, not 1. I actually want that to print 1, so for me it's not a problem. However, my patch can be trivially modified to handle that: just have returncmd() check savestatus the same way exitcmd() does. I think that also allows merging SKIPFUNC and SKIPFUNCDEF, since you can then assume exitstatus was always set appropriately, so it allows for some further reduction in code. I think this is needed regardless to fix a bug as well: trap 'f() { false; return; }; f; echo $?' EXIT Other shells are evenly divided, but POSIX clearly requires this to print 0 (which is what bash/ksh/mksh/yash do): calling a function does not reset any trap information, so this return statement must be considered to execute in a trap action. It currently prints 1 (also seen in bosh/pdksh/posh/zsh). With my suggested changes, it prints 0.
Re: [v2 PATCH] eval: Only restore exit status on exit/return
On 14/12/2018 09:47, Herbert Xu wrote: On Fri, Dec 14, 2018 at 09:37:09AM +, Harald van Dijk wrote: Still, that works as well, doesn't it? The control flow is basically the same so the logic in my other message applies. returncmd() doesn't use exceptions, so you get to dotrap() which already resets exitstatus if necessary. returncmd itself doesn't jump up but it may be part of a subshell which would execute a jump to the top as part of the EV_EXIT optimisation. That's where you'd need to restore the exit status. Ah, okay, so for something like trap 'false; (return); echo $?' EXIT you want to remember that it's executed as part of a trap action and print 0, not 1. I actually want that to print 1, so for me it's not a problem. However, my patch can be trivially modified to handle that: just have returncmd() check savestatus the same way exitcmd() does. I think that also allows merging SKIPFUNC and SKIPFUNCDEF, since you can then assume exitstatus was always set appropriately, so it allows for some further reduction in code.
Re: [v2 PATCH] eval: Only restore exit status on exit/return
On 14/12/2018 08:02, Harald van Dijk wrote: On 14/12/2018 03:10, Herbert Xu wrote: On Sat, Dec 08, 2018 at 03:45:11PM +, Harald van Dijk wrote: --- a/src/eval.c +++ b/src/eval.c @@ -116,10 +116,7 @@ INCLUDE "eval.h" EXITRESET { evalskip = 0; loopnest = 0; - if (savestatus >= 0) { - exitstatus = savestatus; - savestatus = -1; - } + savestatus = -1; } #endif The reason this is needed is to support a naked return. With your patch a naked return would either have to always return the saved status or the new status. Neither of which is what we want. I actually saw the commit you referenced already and tested that with my patch before sending. This is how I tested it: Returns 1: f() { false; return } f Returns 0: f() { trap "false; return" USR1 kill -USR1 $$ } f The former picks up the status of the false command, the latter of the kill command. This works because although returncmd() only looks at exitstatus, so returns the wrong value, in the no-argument version it sets skip to SKIPFUNCDEF, causing dotrap() to restore the original exitstatus value, whereas in the version that does take arguments, skip is set to SKIPFUNC, causing dotrap() to leave exitstatus alone. Which is exactly how that particular commit was supposed to work in the first place, so perhaps it's simpler to put it as "in these tests, changes to exitreset() do not cause the test to break because exitreset() is never called at the wrong time". If there's some test that breaks, it would need to be one where the original exitstatus needs to be restored by exitreset(), *and* the original exitstatus is not already restored by dotrap(). The original exitstatus is not restored by dotrap() only if evalskip == SKIPFUNC (i.e. return with argument), in which case it shouldn't be restored, or it exits with an exception. And exitreset() is only called when control gets back to main() via an exception. So you'd need an example where an exception is raised after the return command successfully completes without arguments, yet before the function the return command is in (if any) actually returns, where that exception would then reach main(). I struggle to come up with such an example.
Re: [v2 PATCH] eval: Only restore exit status on exit/return
On 14/12/2018 03:10, Herbert Xu wrote: On Sat, Dec 08, 2018 at 03:45:11PM +, Harald van Dijk wrote: --- a/src/eval.c +++ b/src/eval.c @@ -116,10 +116,7 @@ INCLUDE "eval.h" EXITRESET { evalskip = 0; loopnest = 0; - if (savestatus >= 0) { - exitstatus = savestatus; - savestatus = -1; - } + savestatus = -1; } #endif The reason this is needed is to support a naked return. With your patch a naked return would either have to always return the saved status or the new status. Neither of which is what we want. I actually saw the commit you referenced already and tested that with my patch before sending. This is how I tested it: Returns 1: f() { false; return } f Returns 0: f() { trap "false; return" USR1 kill -USR1 $$ } f The former picks up the status of the false command, the latter of the kill command. This works because although returncmd() only looks at exitstatus, so returns the wrong value, in the no-argument version it sets skip to SKIPFUNCDEF, causing dotrap() to restore the original exitstatus value, whereas in the version that does take arguments, skip is set to SKIPFUNC, causing dotrap() to leave exitstatus alone.
Re: [v2 PATCH] eval: Only restore exit status on exit/return
On 06/12/2018 21:35, Harald van Dijk wrote: On 04/12/2018 23:57, Harald van Dijk wrote: This has the benefit of fixing one other test case, a small modification from one of Martijn Dekker's: $SHELL -c 'trap "set -o bad@option" INT; kill -s INT $$' && echo BUG Another test case, one that still fails: trap exit INT trap 'true; kill -s INT $$' EXIT false Here, inside the EXIT handler, "the command that executed immediately preceding the trap action" is `false`, but inside the INT handler, it's either `true` or `kill -s INT $$` (I think the latter, but it doesn't matter). dash treats it as if it were still `false`. The attached patch fixes this. In short, it assumes that if the EXIT action raises an exception, exitstatus will have been set appropriately, and modifies exitcmd() to set it. It also simplifies dotrap() by removing the unnecessary special handling of recursive calls. It handles the following test cases, from this thread: exec 2>/dev/null $SHELL -c 'trap "(false) && echo BUG1" INT; kill -s INT $$' $SHELL -c 'trap "(false) && echo BUG2" EXIT' $SHELL -c 'trap "(false); echo \$?" EXIT' $SHELL -c 'trap "(true) || echo BUG3" INT; false; kill -s INT $$' $SHELL -c 'trap "(true) || echo BUG4" EXIT; false' $SHELL -c 'trap "(true); echo \$?" EXIT; false' $SHELL -c 'trap "(! :) && echo BUG5" EXIT' $SHELL -c 'trap "(false) && echo BUG6" EXIT' $SHELL -c 'trap "readonly foo=bar; (foo=baz) && echo BUG7" EXIT' $SHELL -c 'trap "(set -o bad@option) && echo BUG8" EXIT' $SHELL -c 'trap "set -o bad@option" EXIT' && echo BUG9 $SHELL -c 'trap "set -o bad@option" INT; kill -s INT $$' && echo BUG10 $SHELL -c 'trap exit INT; trap "true; kill -s INT $$" EXIT; false' || echo BUG11 It does not change the behaviour for these test cases, also from this thread: $SHELL -c 'trap "(trap \"echo exit\" EXIT; :)" EXIT' $SHELL -c 'trap "(:; exit) && echo exit" EXIT; false' It can be combined with the other patches to also change the behaviour of those two. --- a/src/eval.c +++ b/src/eval.c @@ -116,10 +116,7 @@ INCLUDE "eval.h" EXITRESET { evalskip = 0; loopnest = 0; - if (savestatus >= 0) { - exitstatus = savestatus; - savestatus = -1; - } + savestatus = -1; } #endif --- a/src/main.c +++ b/src/main.c @@ -348,7 +348,9 @@ exitcmd(int argc, char **argv) return 0; if (argc > 1) - savestatus = number(argv[1]); + exitstatus = number(argv[1]); + else if (savestatus >= 0) + exitstatus = savestatus; exraise(EXEXIT); /* NOTREACHED */ --- a/src/trap.c +++ b/src/trap.c @@ -320,17 +320,14 @@ void dotrap(void) char *p; char *q; int i; - int status, last_status; + int status; if (!pending_sig) return; status = savestatus; - last_status = status; - if (likely(status < 0)) { - status = exitstatus; - savestatus = status; - } + savestatus = exitstatus; + pending_sig = 0; barrier(); @@ -350,10 +347,10 @@ void dotrap(void) continue; evalstring(p, 0); if (evalskip != SKIPFUNC) - exitstatus = status; + exitstatus = savestatus; } - savestatus = last_status; + savestatus = status; } @@ -390,8 +387,10 @@ exitshell(void) savestatus = exitstatus; TRACE(("pid %d, exitshell(%d)\n", getpid(), savestatus)); - if (setjmp(loc.loc)) + if (setjmp(loc.loc)) { + savestatus = exitstatus; goto out; + } handler = if ((p = trap[0])) { trap[0] = NULL;
Re: [PATCH] clear_traps: reset savestatus
On 29/11/2018 20:28, Martijn Dekker wrote: Op 29-11-18 om 20:56 schreef Harald van Dijk: That's part of it, but not the whole story. Herbert Xu's comment about exitshell() was right, that is still a problem. A testcase for this is trap '(true) || echo bug' EXIT Yes. Thanks. I hadn't thought about that. The test case above is not quite complete. It needs to make sure the exit status is non-zero before executing the trap: $ src/dash -c "trap '(true) || echo bug' EXIT; false" bug I had intended it as a testcase for dash with your change applied, in which case it fails even without making sure of that since you hit _exit(savestatus) with savestatus reset to -1, but I like that it's so easy to modify to also fail on unpatched dash, so thanks for that. By the way, my change has an unintended but possibly acceptable side effect: trap '(trap "echo exit" EXIT; :)' EXIT This prints nothing with current dash, but prints "exit" with my change. It also prints "exit" in ksh, mksh, posh, and bosh. - Martijn
Re: [PATCH] clear_traps: reset savestatus
On 29/11/2018 15:45, Martijn Dekker wrote: Op 27-11-18 om 17:24 schreef Martijn Dekker: Big bad bug: it appears that subshells always return status 0 in traps. As posted elsewhere, looks like the problem is simply that savestatus ("/* exit status of last command outside traps */") isn't reset to -1 upon resetting traps when forking a subshell. That's part of it, but not the whole story. Herbert Xu's comment about exitshell() was right, that is still a problem. A testcase for this is trap '(true) || echo bug' EXIT Your patch can be extended to handle that though, by skipping exitshell()'s handler if savestatus got reset to -1. Cheers, Harald van Dijk diff --git a/src/trap.c b/src/trap.c index ab0ecd46..eef44f1d 100644 --- a/src/trap.c +++ b/src/trap.c @@ -169,6 +169,7 @@ clear_traps(void) } trapcnt = 0; INTON; + savestatus = -1; } @@ -387,11 +388,18 @@ exitshell(void) { struct jmploc loc; char *p; + struct jmploc *volatile savehandler; savestatus = exitstatus; TRACE(("pid %d, exitshell(%d)\n", getpid(), savestatus)); - if (setjmp(loc.loc)) + savehandler = handler; + if (setjmp(loc.loc)) { + if (savestatus == -1) { + handler = savehandler; + longjmp(handler->loc, 1); + } goto out; + } handler = if ((p = trap[0])) { trap[0] = NULL;
Re: [PATCH] var: ensure variables are fully initialised when unset
On 12/11/2018 12:53, Ron Yorston wrote: When a variable is unset by calling setvar(name, 0, 0) the code to initialise the new, empty variable omits the trailing '='. It's supposed to. A trailing = means the variable is set to an empty string. That's different from unset. You can see the difference with set -u, or with ${var+set}. However, ... Attempts to read the contents of the unset variable will result in the uninitialised character at the end of the string being accessed. ...this is indeed a bug which I've noticed as well. The code needs two trailing null bytes, not just one. Because of glibc internals, the out-of-bounds byte being read will almost certainly be zero on x86-64, but it's not a guarantee, and it could probably break more easily on other platforms. It only affects shell-internal uses of variables, only for variables explicitly unset by a script (rather than unset by default), only for uses where the code does not explicitly check for unset beforehand. As far as scripts go, that just means PATH (as you found) I think, for interactive shells there are some more variables such as PS1/PS2/PS4/MAIL. My patch is attached. Cheers, Harald van Dijk diff --git a/src/var.c b/src/var.c index 0d7e1db0..9163cca9 100644 --- a/src/var.c +++ b/src/var.c @@ -206,9 +206,9 @@ struct var *setvar(const char *name, const char *val, int flags) vallen = strlen(val); } INTOFF; - p = mempcpy(nameeq = ckmalloc(namelen + vallen + 2), name, namelen); + p = mempcpy(nameeq = ckmalloc(namelen + vallen + 2), name, namelen + 1); if (val) { - *p++ = '='; + p[-1] = '='; p = mempcpy(p, val, vallen); } *p = '\0';
Re: Negative LINENO possible with indirection
On 17/09/2018 23:46, Max Rees wrote: Hello, While running the shunit2 test suite[1][2] against dash, the following problem presented itself: when multiple layers of indirection are used, $LINENO yields a negative number. It appears that the root of the problem is that lineno is decremented in multiple places in src/eval.c with no lower bound, but I wasn't sure how to fix that. A minimal test case follows below. Hi, A more minimal test case is : : f() { eval echo \$LINENO; } f When this script is parsed, the function definition gets an associated line number 3, and the eval command invocation also gets an associated line number 3. POSIX says LINENO Set by the shell to a decimal number representing the current sequential line number (numbered starting with 1) within a script or function before it executes each command. If the user unsets or resets LINENO, the variable may lose its special meaning for the life of the shell. If the shell is not currently executing a script or function, the value of LINENO is unspecified. This volume of POSIX.1-2017 specifies the effects of the variable only for systems supporting the User Portability Utilities option. I have interpreted this as saying that when a script defines a function, inside that function, lines are numbered starting from 1 again. So when the eval command is executed, LINENO should be set to 1. This is achieved by subtracting the function definition's line number (less one). This is also how zsh has interpreted the spec. However, most other shells agree with bash, which says: x. The expansion of $LINENO inside a shell function is only relative to the function start if the shell is interactive -- if the shell is running a script, $LINENO expands to the line number in the script. This is as POSIX-2001 requires. Continuing with the original interpretation: When the eval command executes, the echo command is parsed in isolation. It gets an associated line number 1. But it is executed in the context of the function f, so f's associated line number is taken off again. The result is 1 - (3 - 1) == -1. There are multiple possible approaches to change this behaviour, depending on the intended behaviour. 1: Function definitions could be ignored for LINENO purposes if the function definitions are in a script. Since that takes out the subtraction, it would at least get rid of the negative numbers. 2: Function definitions' commands could be ignored for LINENO purposes. This preserves the outer-most command's line number. Since that also takes out the subtraction, it would at least get rid of the negative numbers. 3: eval's commands could be executed as if they are not in the context of a function. This would mean that eval 'echo $LINENO' always prints 1. 4: eval's commands could be executed as if they are in the context of a function with a faked associated line number, so that echo $LINENO and eval 'echo $LINENO' always print the same number, but eval 'echo $LINENO echo $LINENO' prints two different line numbers. 5: The commands eval sees after parsing could get a fixed associated line number, so that eval 'echo $LINENO echo $LINENO' always prints '0 0' or '1 1'. 6: The commands eval sees after parsing could get eval's associated line number, so that echo $LINENO and eval 'echo $LINENO' always print the same number, and eval 'echo $LINENO echo $LINENO' prints the same line number twice. As long as the user never modifies LINENO, this can also be equivalently stated as: the commands eval sees after parsing could avoid setting LINENO. Testing the behaviour of various shells shows, taking dash as the baseline: bash | 1 4 bosh | 2 5 ksh | 1 3 mksh | 1 6 pdksh | 1 5 posh | 1 5 yash | 1 3 zsh | 6 Personally, I am in favour of 1. It is more sensible behaviour and I only implemented it the way it is now because of what POSIX said. If a different, equally reasonable interpretation can get a more desirable outcome, I would suggest going with that interpretation. It involves a bit more than just taking out the subtractions: it needs a bit of thinking to figure out how to ensure LINENO does get set properly in functions in interactive shells. As for the rest, unless there are strong reasons to go after a particular result, since POSIX says the results are unspecified, I would suggest going with whatever is the simplest to implement. Cheers, Harald van Dijk Please CC me on replies - I'm not subscribed. Max
Re: [BUG] failure to push/restore closed file descriptor
On 23/04/2018 19:56, Martijn Dekker wrote: $ dash -c '{ exec 8Apparently, dash either fails to push the file descriptor onto the stack at '} 8<&-', or fails to restore it. Same bug with loops ending in "done 8<&-". Confirmed in all dash versions down to 0.5.5.1. What surprises me most is that dash has code written specifically to keep the fd closed. dash would be smaller and simpler if it behaved the way you expected and the way most other shells behave: just remove all traces of REALLY_CLOSED. FWIW, this also happens with n< which is ignored entirely. The behaviour I would expect for that (and which I have implemented, but which I am pretty sure has no chance anyway of getting into dash) is to issue an error message if that fd is not open, and to save and later restore the fd if it is open. Test cases: : 8<&8 Assuming fd 8 does not happen to be open already, bash and dash are the only shells I've tested which accept this. ksh93, mksh, pdksh, yash, zsh, bosh all reject it. echo ok | { { exec 0<&-; } 0<&0; cat; } Here, bash and dash are in agreement again, but this time they're not alone: mksh and posh also cause an error message by leaving stdin closed. The other shells restore stdin and print ok. There is one more corner case on the subject of redirections, which behaves unexpectedly in two shells in two different ways: echo ok | cat 0>&0 Almost all shells are okay with this. The exceptions are bosh and yash: yash rejects the redirection as fd 0 is not writeable, bosh appears to mishandle all 0>* redirections, instead treating them as 1>*. Cheers, Harald van Dijk
Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion
On 3/29/18 6:42 AM, Herbert Xu wrote: On Wed, Mar 28, 2018 at 03:06:32PM +0200, Harald van Dijk wrote: Since bash itself is inconsistent, and POSIX unclear, What exactly is unclear about the sentence of POSIX that I quoted? Is there a legitimate interpretation of "It is unspecified whether A or B" that allows other behaviour than A or B? The tricky part is coming up with a way to actually produce an unescaped backslash. You're right that what other shells implement doesn't allow any possibility of unescaped backslashes in the shell. But if what they do is what's intended, the sentence still makes perfect sense to me: it simply wouldn't affect the shell. Remember that the description of pattern matching applies to both the shell and the non-shell cases. For the non-shell cases, it's trivial, and it's possible that that was all it was intended to address: find . -name 'a\' Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion
On 28/03/2018 13:00, Herbert Xu wrote: On Wed, Mar 28, 2018 at 12:53:31PM +0200, Harald van Dijk wrote: [un-snip] If a pattern ends with an unescaped , it is unspecified whether the pattern does not match anything or the pattern is treated as invalid. I don't think this allows dash's behaviour of taking the backslash as a literal, since that still allows a match to succeed. bash lets such a pattern never match. In other shells, there is no way to get into this situation, but GNU find behaves the same as bash. Nope, bash treats it as if the backslash is not there. $ pwd /home/dash $ touch asdf\\ $ touch asdf $ bash -c 'v="../da*sh/asdf\\"; printf "%s\n" $v' ../dash/asdf $ Huh. Indeed in that case, but at the same time: bash -c 'v="?sdf\\"; printf "%s\n" $v' Here, it will not be considered to match, this just prints ?sdf\. I suspect bash has the same kind of logic as dash, where it just tries to lstat() the file if the final pathname component contains no metacharacters or goes through an opendir()/readdir()/closedir() sequence if it does, and mishandles this corner case. [...] Since bash itself is inconsistent, and POSIX unclear, What exactly is unclear about the sentence of POSIX that I quoted? Is there a legitimate interpretation of "It is unspecified whether A or B" that allows other behaviour than A or B? Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion
On 28/03/2018 12:37, Herbert Xu wrote: On Wed, Mar 28, 2018 at 12:03:24PM +0200, Harald van Dijk wrote: When expanded backslashes are no longer treated as quoted, this would call expmeta() with the pattern *\, that is with a single unquoted trailing backslash, so: [...] Thanks for the explanation. Here is an updated patch. That seems consistent with what pmatch() does for trailing unquoted backslashes, but actually, I think pmatch() is wrong on that too. POSIX says: If a pattern ends with an unescaped , it is unspecified whether the pattern does not match anything or the pattern is treated as invalid. I don't think this allows dash's behaviour of taking the backslash as a literal, since that still allows a match to succeed. bash lets such a pattern never match. In other shells, there is no way to get into this situation, but GNU find behaves the same as bash. Test case: v=\\; case \\ in $v) echo bug;; esac bash prints nothing, dash 0.5.9.1 prints bug. Other shells don't count since they interpret $v differently. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 28/03/2018 11:52, Herbert Xu wrote: On Wed, Mar 28, 2018 at 08:44:28AM +0200, Harald van Dijk wrote: Test case: $v='*\' set -- $v I don't see how this would cause an overrun, can you please explain it for me? Line numbers are from 0.5.9.1. When expanded backslashes are no longer treated as quoted, this would call expmeta() with the pattern *\, that is with a single unquoted trailing backslash, so: expand.c:1333 if (*p == '\\') esc++; if (p[esc] == '/') { The first if statement will be hit and set esc to 1. p[esc] is then '\0', so the second if block doesn't get entered and the outer loop continues: expand.c:1315 for (p = name; esc = 0, *p; p += esc + 1) { p += esc + 1 will increase p by 2, letting it point just past the terminating '\0'. The loop condition of *p now accesses the byte just past the pattern. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 28/03/2018 10:16, Herbert Xu wrote: On Wed, Mar 28, 2018 at 09:38:04AM +0200, Harald van Dijk wrote: Also, it's just as logical to restore the case pattern matching to its traditional behaviour to align it with pathname expansion. No I think that makes no sense and clearly contradicts POSIX. That's not clear at all. As I already wrote earlier and Jilles Tjoelker chimed in with as well, if there isn't a single shell out there that actually implements what POSIX specifies (bash gets close, but isn't fully there either), not even the shells on which the POSIX specification was based, there's a fair chance it's a POSIX defect, regardless of whether it makes sense. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 28/03/2018 09:31, Herbert Xu wrote: On Wed, Mar 28, 2018 at 08:52:57AM +0200, Harald van Dijk wrote: I seriously cannot understand the logic of pushing a break from traditional ash behaviour and from all shells apart from bash, giving POSIX as a reason for doing it, and then giving the behaviour of all those other shells as a reason for not doing it properly. It's logical for dash because this aligns the behaviour of pathname expansion with case pattern matching. Except it doesn't actually do that. It does in some cases, and not in others. You've made it clear that you don't care about the cases where it doesn't. That can be a valid decision, but it's one you should acknowledge. Also, it's just as logical to restore the case pattern matching to its traditional behaviour to align it with pathname expansion. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 28/03/2018 08:18, Herbert Xu wrote: On Tue, Mar 27, 2018 at 08:38:01PM +0200, Harald van Dijk wrote: My inclination is to just drop the /d\ev issue and use the bash behaviour until such a time that bash changes or POSIX changes. What would need to change in POSIX to convince you to change dash to match what you were already arguing is the POSIX-mandated behaviour? :) No the passage I quoted only mandates that backslashes be interpreted when doing the matching. It doesn't say anything about whether it should be removed after matching is done. The only thing it does say is that: If the pattern does not match any existing filenames or pathnames, the pattern string shall be left unchanged. IOW it's silent in the case where the pattern does match. Of course it doesn't say whether characters in the pattern are preserved in the case of a match. That's because when there's a match, pathname expansion produces the file name, not the original pattern. In a directory containing foo.txt, you wouldn't argue that *.txt might expand to *foo.txt just because POSIX doesn't explicitly say the * is removed, would you? However, the other reason this is not worth fixing is because all those other shells that always treat the backslash as a literal won't remove it in this case anyway. I seriously cannot understand the logic of pushing a break from traditional ash behaviour and from all shells apart from bash, giving POSIX as a reason for doing it, and then giving the behaviour of all those other shells as a reason for not doing it properly. If all those other shells are a reason against doing it, then why isn't that a reason for just restoring the dash 0.5.4 behaviour that all those other shells implement, always taking expanded backslash as effectively quoted? So nor sensible script could possibly use such a feature even if we implemented it. No portable script could use such a feature. A sensible script certainly could. There are a lot of scripts that aren't meant to be portable across shells. I know I've written scripts for my own use that use dash features that don't work the same in bash. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 28/03/2018 08:23, Herbert Xu wrote: On Wed, Mar 28, 2018 at 12:19:17AM +0200, Harald van Dijk wrote: This introduces a buffer overread. When expmeta() sees a backslash, it assumes it can just skip the next character, assuming the next character is not a forward slash. By treating expanded backslashes as unquoted, it becomes possible for the next character to be the terminating '\0'. This code has always had to deal with naked backslashes. Can you show me the exact pattern that results in the overread? No, it hasn't, because expmeta() is not used in case patterns, and case patterns are currently the only case where naked backslashes can appear. In contexts where pathname expansion is performed, a backslash coming from a variable will be escaped by another backslash in currently released dash versions. Test case: $v='*\' set -- $v Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 3/27/18 10:55 AM, Herbert Xu wrote: So going back to dash yes I think we should adopt the bash behaviour for pathname expansion and keep the existing case semantics. This patch does exactly that. Note that this patch does not work unless you have already applied https://patchwork.kernel.org/patch/10306507/ because otherwise the optimisation mentioned above does not get detected correctly and we will end up doing quote removal twice. This introduces a buffer overread. When expmeta() sees a backslash, it assumes it can just skip the next character, assuming the next character is not a forward slash. By treating expanded backslashes as unquoted, it becomes possible for the next character to be the terminating '\0'. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 3/27/18 11:32 PM, Jilles Tjoelker wrote: I don't think it is clear at all. Note the final paragraph of 2.13.1: ] When pattern matching is used where shell quote removal is not ] performed [...] This implies that special characters cannot be escaped using a backslash in a context where shell quote removal is performed. Taken literally, it just says something about what happens when shell quote removal is not performed. In the cases we're talking about, shell quote removal is performed, so this simply wouldn't apply. Perhaps that's taking it more literally than intended, I don't know. Instead, special characters can be escaped using shell quoting. As a result, the simplest form of parameter expansion has either all or none of the generated characters quoted (depending on whether the expansion is in double-quotes or not). There is also a sentence "The shell special characters always require quoting." which seems untrue in practice if the characters come from an expansion. Something like touch 'a' sh -c 'w=*\&*; printf "%s\n" $w' works for many shells as sh. However, this could be explained away as undefined behaviour. This is what allows extensions to glob syntax, if those extensions use shell special characters. p="*(ab)"; case abab in $p) echo match ;; esac This prints "match" in ksh93. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 27/03/2018 20:24, Herbert Xu wrote: On Tue, Mar 27, 2018 at 08:01:10PM +0200, Harald van Dijk wrote: Right. I guess we could change it so that CTLESC is preserved to distinguish between the backslashes from parameter expansion vs. backslashes from open input. Thinking about it some more, see below. Actually let's not go down this route because this would mean that the glob(3) path will never have the same functionality since it cannot possibly see CTLESC. Yes, that's why I ended up proposing what I put below, which does allow them the same functionality. Neither glob() nor expmeta() would see CTLESC, and neither would need to, since neither would need to optimise for the no-metachars case. It was just the most basic command I could think of that shouldn't hit the FS, currently doesn't hit the FS, and would start hitting the FS if backslash gets taken as a metacharacter. Anything containing a quoted metacharacter would do. I suppose I could have used echo "[ ok ]" instead for something that's more likely to pop up in a real script. My inclination is to just drop the /d\ev issue and use the bash behaviour until such a time that bash changes or POSIX changes. What would need to change in POSIX to convince you to change dash to match what you were already arguing is the POSIX-mandated behaviour? :) More serious response: I do think it's worth raising this as a bash bug too and seeing what they do. It's such an unlikely case as why would anyone knowingly put backslash into a variable and expecting pathname expansion to remove it without actually using real magic characters. That's true. The least unlikely scenario I can think of is a directory name containing [ but not ]. A script may end up escaping the [ in that case just as a precaution, even though it's not strictly needed, and glob() may return early due to GLOB_NOMAGIC picking up on the fact that [ is not magic in that context. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 27/03/2018 19:02, Herbert Xu wrote: On Tue, Mar 27, 2018 at 06:48:45PM +0200, Harald van Dijk wrote: Backslashes coming from parameters, sure, but backslashes introduced by preglob(), I'm not so sure. Right. I guess we could change it so that CTLESC is preserved to distinguish between the backslashes from parameter expansion vs. backslashes from open input. Thinking about it some more, see below. The [ character would mark the whole string possibly-meta in expandmeta(), and the CTLESC wouldn't be seen there. Then, preglob() turns it into \[, and expmeta() wouldn't take [ as a meta character, but would take \ as a meta character. Oh I though you were talking about test/[, I didn't see the set. But why would someone do set "["? It was just the most basic command I could think of that shouldn't hit the FS, currently doesn't hit the FS, and would start hitting the FS if backslash gets taken as a metacharacter. Anything containing a quoted metacharacter would do. I suppose I could have used echo "[ ok ]" instead for something that's more likely to pop up in a real script. Then, how about moving some of expmeta()'s checks to return early outside of it, so that they can also be done in the glob() case? We could. expandmeta() is implemented twice, once for the glob() case, once for the expmeta() case. There is not much code shared between them except for preglob(). preglob(), through _rmescapes(), already has to go over the whole string and check each character. If that can be made to return an indication somehow of whether it has seen metacharacters, that could be used for both cases. Perhaps an additional flag that, when set, lets it return NULL (and freeing the allocated string) if no metacharacters were seen? That means expmeta() doesn't need to learn about CTLESC, means there's no need to go over the string a second time in the glob() case to really turn CTLESC into \, and means GLOB_NOMAGIC becomes unnecessary and can be removed as a workaround for glibc bugs. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 27/03/2018 18:04, Herbert Xu wrote: On Tue, Mar 27, 2018 at 05:54:53PM +0200, Harald van Dijk wrote: I was thinking about not making backslashes set metaflag in expmeta(): when the pathname component doesn't include *, ?, or [, but does include backslashes, then the if (metaflag == 0) block could handle that as long as it performs the lstat64() check unconditionally. There's no need to go through the opendir()/readdir()/closedir() path for that case. Since expmeta() is bypassed for words not containing any potentially-magic characters, the impact might be small enough. Honestly such backslashes should be rare enough that I wouldn't bother with such an optimisation. Backslashes coming from parameters, sure, but backslashes introduced by preglob(), I'm not so sure. Regardless of whether metaflag is set, it would mean things like 'set "["' would start hitting the FS unnecessarily, if I understand correctly: the preglob()-expanded pattern is '\[', and expmeta() can no longer tell this apart from $v where v='\[' where a FS check would be required. No it shouldn't. We only mark [ is meta if there is a matching ]. The [ character would mark the whole string possibly-meta in expandmeta(), and the CTLESC wouldn't be seen there. Then, preglob() turns it into \[, and expmeta() wouldn't take [ as a meta character, but would take \ as a meta character. Perhaps preglob() should just be avoided, and expmeta() taught to respect both '\\' and CTLESC. '\\' would be a metacharacter and require a FS hit, CTLESC wouldn't require it. This would also avoid some memory allocations. We need preglob for glob(3). I want to minimise the amount of code difference between the glob path and the expmeta path. Then, how about moving some of expmeta()'s checks to return early outside of it, so that they can also be done in the glob() case? if (!strpbrk(str->text, metachars)) goto nometa; in expandmeta() is done before preglob() is called. Here, it is still not necessary to include CTLESC. We could move it after preglob. Assuming backslash is added to metachars, which was needed anyway: Regardless of whether the check is done before or after preglob(), it would never skip expmeta() when it shouldn't. If it's kept before preglob(), then it will correctly skip expmeta() in more cases than if it's moved after it, so I don't think there's any benefit in moving it. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 27/03/2018 17:14, Herbert Xu wrote: On Tue, Mar 27, 2018 at 02:57:12PM +0200, Harald van Dijk wrote: That's a good point and wouldn't have too much of an impact on performance of existing scripts. BTW, that means both expandmeta()'s metachars variable, and expmeta()'s if (metaflag == 0 || lstat64(expdir, ) >= 0) optimisation. You already got rid of the latter in your patch to prevent buffer overflows. No the purpose of this metaflag check is to bypass the lstat call. My patch simply made it bypass the call by returning earlier. Oh, right. It is not relevant to whether we include backslash as a metacharacter. I was thinking about not making backslashes set metaflag in expmeta(): when the pathname component doesn't include *, ?, or [, but does include backslashes, then the if (metaflag == 0) block could handle that as long as it performs the lstat64() check unconditionally. There's no need to go through the opendir()/readdir()/closedir() path for that case. Since expmeta() is bypassed for words not containing any potentially-magic characters, the impact might be small enough. Regardless of whether metaflag is set, it would mean things like 'set "["' would start hitting the FS unnecessarily, if I understand correctly: the preglob()-expanded pattern is '\[', and expmeta() can no longer tell this apart from $v where v='\[' where a FS check would be required. Perhaps preglob() should just be avoided, and expmeta() taught to respect both '\\' and CTLESC. '\\' would be a metacharacter and require a FS hit, CTLESC wouldn't require it. This would also avoid some memory allocations. In regular /d\ev that doesn't come from a variable, the backslash will have been replaced by CTLESC, but it's not necessary to treat CTLESC as a metacharacter: in that case, either there's a match and pathname expansion produces /dev, or there's no match and quote removal produces /dev, so the optimisation is still valid. It'd only affect backslashes that come from a variable, so it's very limited in scope. For the case where the backslash came from the parser, the CTLESC would have already been removed by preglob so there should be no difference. if (!strpbrk(str->text, metachars)) goto nometa; in expandmeta() is done before preglob() is called. Here, it is still not necessary to include CTLESC. The only problem with the metacharacter approach is that glibc's glob(3) probably won't treat it as a metacharacter and therefore it will still behave in the same way as the current bash. This doesn't matter much yet, but yeah, it will if you decide to enable --enable-glob by default in the future. As long as you don't include GLOB_NOESCAPE, then normally, it should be taken as escaping the next character, but it might still trigger GLOB_NOMAGIC to return early. If it does, I'd take that as a glibc bug, but that doesn't help dash. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 27/03/2018 14:19, Herbert Xu wrote: Nowhere does it say that backslashes that come from the result of parameter expansion is always literal. So it's clear that any shell that treats the backslash as a literal backslash is already broken. By the literal POSIX wording, yes, but the fact remains that that's what all shells aside from bash were already doing, so that may mean it's a POSIX defect. That's why I qualified with "POSIX specifies and/or intends to specify". But I'm tempted to agree with your and Martijn's point of view now that it's better to just implement it the way POSIX specifies, that it's more useful that way. On Tue, Mar 27, 2018 at 01:07:21PM +0200, Harald van Dijk wrote: Can you show me any shell other than bash that lets this optimisation affect the results? The fact is that the other shells are not doing what they do because they're not doing this optimisation. They do what they do because they have broken POSIX because they always treat backslashes that arise from parameter expansion as literal backslashes. Right, but they're doing this optimisation as well. FWIW it wouldn't be hard to iron out the anomalous case of /d\ev. You just have treat the backslash as a metacharacter. That's a good point and wouldn't have too much of an impact on performance of existing scripts. BTW, that means both expandmeta()'s metachars variable, and expmeta()'s if (metaflag == 0 || lstat64(expdir, ) >= 0) optimisation. You already got rid of the latter in your patch to prevent buffer overflows. In regular /d\ev that doesn't come from a variable, the backslash will have been replaced by CTLESC, but it's not necessary to treat CTLESC as a metacharacter: in that case, either there's a match and pathname expansion produces /dev, or there's no match and quote removal produces /dev, so the optimisation is still valid. It'd only affect backslashes that come from a variable, so it's very limited in scope. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Do not quote backslashes in unquoted parameter expansion
On 27/03/2018 11:44, Herbert Xu wrote: On Tue, Mar 27, 2018 at 11:16:29AM +0200, Harald van Dijk wrote: If you say that quote removal takes place on the original token (meaning before parameter expansion), and if parameter expansion takes place before pathname expansion, then there's nothing left to allow \* to behave differently from *. Either you misunderstood me or you misread POSIX. I misunderstood you. Given v='\' and the word \\$v, I took the result of parameter expansion as \\\ where the first two backslashes come from the original word, you take the result of parameter expansion as only \. Because of that, when you wrote quote removal is never applied to the results of parameter expansion, I didn't see what you meant. I think you're right in your terminology, the result of parameter expansion is just \, to get \\\ it would need to be phrased as something like "the result of applying parameter expansion". Thanks for the clarification. POSIX never actually says this optimisation is allowed. The only thing that allows it is the fact that it produces the same results anyway. If that stops, then checking the file system for matches becomes required. It doesn't disallow it either. If POSIX specifies a result, and a shell applies an optimisation that causes a different result to be produced, doesn't that inherently disallow that optimisation? There may be some confusion and/or disagreement over what exactly POSIX specifies and/or intends to specify, but I don't see how you can argue that POSIX specifies result A, but it's okay to apply an optimisation that produces result B. Can you show me a shell that doesn't apply this optimisation? Can you show me any shell other than bash that lets this optimisation affect the results? Like I wrote in my initial message, all other shells I tested take a backslash that comes from a variable after parameter expansion as effectively quoted. Given your b="/*/\null", bash is the only shell I know that expands $b to /dev/null. All others do perform pathname expansion, but check to see if a file \null exists in any top-level directory. Given a="/de[v]/\null" and b="/dev/\null", all shells other than bash, even dash, agree that $a and $b expand to '/dev/\null' if that file exists, or the original string if that file doesn't exist. *That* is what allows the optimisation in the case of $b: expanding to either '/dev/\null' or '/dev/\null', depending on some condition, can be done without evaluating that condition. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backslashes in unquoted parameter expansions
On 26/03/2018 11:34, Martijn Dekker wrote: Op 25-03-18 om 22:56 schreef Harald van Dijk: case /dev in $pat) echo why ;; esac Now, bash and dash say that the pattern does match -- they take the backslash as unquoted, allowing it to escape the v. Most other shells (bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as quoted. This doesn't make sense to me, and doesn't match historic practice: [...] With the snipping it's not clear that I was specifically confused by the inconsistency. I had included another example: pat="/de\v" printf "%s\n" $pat I can understand treating backslash as quoted, or treating it as unquoted, but not quoted-unless-in-a-case-statement. What justifies this one exception? Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Fix buffer overflow in expandmeta
On 3/26/18 7:12 AM, Herbert Xu wrote: FWIW your patch when built with -O2 with gcc 4.7.2 is actually 16 bytes bigger than my version. Interesting. Not sure what might cause that. @@ -1344,6 +1335,8 @@ expmeta(char *enddir, char *name) metaflag++; p = name; do { + if (enddir == expdir + PATH_MAX) + return; if (*p == '\\') p++; *enddir++ = *p; Also there is another loop in between these two hunks that also needs to check enddir. Indeed, thanks. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Fix buffer overflow in expandmeta
On 3/26/18 4:54 AM, Herbert Xu wrote: On Mon, Mar 26, 2018 at 03:03:38AM +0200, Harald van Dijk wrote: This was already the case before your patch, but on systems that flat out reject paths longer than PATH_MAX (that includes GNU/Linux), it seems weird that expansion can produce paths which then don't work. PATH_MAX only applies in certain cases. Besides, you can always cd into the directories one level at a time to circumvent it. In other words, when you change the path to a different one than what globbing produced. That was kind of my point. :) Besides, even if we did impose a limit here we'd still have to check that limit at every recursion level which means that the code won't be that much simpler. Proof of concept attached. The resulting code is about 100 bytes smaller at -Os than the dynamic reallocation approach. It's possible to further reduce that with a statically allocated buffer, but of course that means a constant memory cost at run time. Cheers, Harald van Dijk --- a/src/expand.c +++ b/src/expand.c @@ -122,7 +122,7 @@ STATIC void expandmeta(struct strlist *, int); #ifdef HAVE_GLOB STATIC void addglob(const glob_t *); #else -STATIC void expmeta(char *, char *); +STATIC void expmeta(char *); STATIC struct strlist *expsort(struct strlist *); STATIC struct strlist *msort(struct strlist *, int); #endif @@ -1237,9 +1237,6 @@ addglob(pglob) #else /* HAVE_GLOB */ -STATIC char *expdir; - - STATIC void expandmeta(struct strlist *str, int flag) { @@ -1261,13 +1258,7 @@ expandmeta(struct strlist *str, int flag) INTOFF; p = preglob(str->text, RMESCAPE_ALLOC | RMESCAPE_HEAP); - { - int i = strlen(str->text); - expdir = ckmalloc(i < 2048 ? 2048 : i); /* XXX */ - } - - expmeta(expdir, p); - ckfree(expdir); + expmeta(p); if (p != str->text) ckfree(p); INTON; @@ -1296,7 +1287,7 @@ nometa: */ STATIC void -expmeta(char *enddir, char *name) +expmeta1(char *expdir, char *enddir, char *name) { char *p; const char *cp; @@ -1344,6 +1335,8 @@ expmeta(char *enddir, char *name) metaflag++; p = name; do { + if (enddir == expdir + PATH_MAX) +return; if (*p == '\\') p++; *enddir++ = *p; @@ -1390,22 +1383,34 @@ expmeta(char *enddir, char *name) if (dp->d_name[0] == '.' && ! matchdot) continue; if (pmatch(start, dp->d_name)) { + p = enddir; + cp = dp->d_name; + for (;;) { +if (p == expdir + PATH_MAX) + goto toolong; +if ((*p++ = *cp++) == '\0') + break; + } if (atend) { -scopy(dp->d_name, enddir); addfname(expdir); } else { -for (p = enddir, cp = dp->d_name; - (*p++ = *cp++) != '\0';) - continue; p[-1] = '/'; -expmeta(p, endname); +expmeta1(expdir, p, endname); } } +toolong: ; } closedir(dirp); if (! atend) endname[-esc - 1] = esc ? '\\' : '/'; } + +STATIC void +expmeta(char *name) +{ + char expdir[PATH_MAX]; + expmeta1(expdir, expdir, name); +} #endif /* HAVE_GLOB */
Re: expand: Fix buffer overflow in expandmeta
On 3/25/18 10:38 AM, Herbert Xu wrote: The native version of expandmeta allocates a buffer that may be overrun for two reasons. First of all the size is 1 byte too small but this is normally hidden because the minimum size is rounded up to 2048 bytes. Secondly, if the directory level is deep enough, any buffer can be overrun. This patch fixes both problems by calling realloc when necessary. This was already the case before your patch, but on systems that flat out reject paths longer than PATH_MAX (that includes GNU/Linux), it seems weird that expansion can produce paths which then don't work. Test case: cd /tmp x=x x=$x$x$x$x$x$x$x$x$x$x$x$x$x$x$x ln -s . $x set -- $x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x* case $1 in *"*") echo "no match" ;; *)if test -e "$1" then echo "match and exists" else echo "match but does not exist" fi ;; esac rm $x I'm seeing "no match" from bash/mksh/pdksh/posh/yash/zsh, but "match but does not exist" from dash/bosh/ksh93. Other commands would naturally print the "File name too long" error. Should the buffer perhaps simply be limited to PATH_MAX, taking care to just give up if something longer is found? That could avoid the need to handle reallocations and result in somewhat smaller and simpler code. The downside would be that if other systems do support paths longer than PATH_MAX, globbing would no longer work for those paths. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Fix glibc glob(3) support
On 3/25/18 10:06 AM, Herbert Xu wrote: It's been a while since we disabled glob(3) support by default. It appears to be working now, however, we have to change our code to remove a workaround that now actually prevents it from working. In particular, we no longer use GLOB_NOMAGIC or test GLOB_MAGCHAR. I thought that was an optimisation to avoid unnecessary file system access, to ensure that simple commands such as "echo ok" don't bother looking for files named "echo" and "ok" in the current directory. It indeed doesn't work, but I took that to be a glibc bug. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC/Feature request: UTF-8 support
On 3/22/18 8:12 PM, Harald van Dijk wrote: I have a local patch set of stuff that I haven't considered in an appropriate state to send to the list. One of the features I've got working is multibyte locale support. It doesn't assume UTF-8 (only assumes stateless encodings) and is tolerant of malformed strings. I've got it working in pattern matching, in glob() sorting, in trimming (#, ##, % and %%), in field splitting, and in "$*" joining. I'll see if I can polish it a bit and make it available when I've got some more free time again, but I'm not sure how soon that will be. Actually, there's no real harm in just publishing what I have now. https://github.com/hvdijk/dash This is my personal playground, so I'll have to include some disclaimers: Commits may take a different approach than what dash is interested in. Commits may be poorly tested. Commits may contain bugs. Commits may have non-obvious dependencies on earlier commits. Commits may be poor style. Commits may be re-written and force-pushed. That said, feel free to see if the locale support does what you're after. If so, it should be possible to bring into a shape suitable for dash. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Fix ghost fields with unquoted $@/$*
On 23/03/2018 14:14, Herbert Xu wrote: On Fri, Mar 23, 2018 at 01:52:10PM +0100, Harald van Dijk wrote: It doesn't, and I didn't say it did. POSIX doesn't care how it's implemented, POSIX cares about the results produced. When another approach produces the same results, both approaches are equally correct. Right, but restoring the old behaviour for white spaces only is silly. It's silly to have this work: set -- A1 B2 C3 echo ${@%2 C3} but not this: set -- A1 B2 C3 IFS=: echo ${@%2:C3} Agreed. Both are doable in a POSIX-compatible way, but the second would require a different approach: either scanning the strings to suppress the addition of the separator character in those cases where its presence would cause splitting to produce an additional field, or big changes in how regions are recorded and split. Either way, that's a large cost for a feature that's likely seen very little real-world use. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Fix ghost fields with unquoted $@/$*
On 23/03/2018 13:20, Herbert Xu wrote: On Fri, Mar 23, 2018 at 12:19:28PM +0100, Harald van Dijk wrote: Right. I think I'll leave this one alone. It worked purely by chance previously because dash incorrectly used the first IFS character even when it's outside of double-quotes, which is why the expansion ${@%2 C3} matches. When the first IFS character is a whitespace character, which it usually is, it's not incorrect: inserting that character and then splitting on it produces the exact results that POSIX requires. It's only when IFS doesn't start with a whitespace character that it's incorrect. Where does POSIX require this? It doesn't, and I didn't say it did. POSIX doesn't care how it's implemented, POSIX cares about the results produced. When another approach produces the same results, both approaches are equally correct. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Fix ghost fields with unquoted $@/$*
On 23/03/2018 10:10, Herbert Xu wrote: On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote: Also the above. But it breaks a traditional ash extension: unset IFS set -- A1 B2 C3 echo ${@%2 C3} This used to print A1 B, but prints A1 B2 C3 with your patch. echo ${@%2} This used to print A1 B2 C3, but prints A1 B with your patch. Hmm, it still does on my machine: Apologies, I ended up sending the wrong test case. It's when IFS has its default value that the behaviour is changed. It's when it's unset that it still works. Initial testing was against 0.5.9.1 plus your patch, now retested against c166b71 plus your patch. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand: Fix ghost fields with unquoted $@/$*
On 23/03/2018 05:37, Herbert Xu wrote: You're right. The proper fix to this is to ensure that nulonly is not set in varvalue for $*. It should only be set for $@ when it's inside double quotes. In fact there is another bug while we're playing with $@/$*. When IFS is set to a non-whitespace character such as :, $* outside quotes won't remove empty fields as it should. That's not limited to empty fields: IFS=, set -- , , set -- $@ echo $# This should print 2, not 3. (bash gets this wrong too.) This patch fixes both problems. Also the above. But it breaks a traditional ash extension: unset IFS set -- A1 B2 C3 echo ${@%2 C3} This used to print A1 B, but prints A1 B2 C3 with your patch. echo ${@%2} This used to print A1 B2 C3, but prints A1 B with your patch. This extension was already pretty much broken within quoted expansions: "${@%2}" would already expand to A1 B. Your patch extends that to the non-quoted case. I do not know if you want to keep this extension in. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] don't record empty IFS scan regions
On 22/03/2018 22:38, Martijn Dekker wrote: Op 22-03-18 om 20:28 schreef Harald van Dijk: On 22/03/2018 03:40, Martijn Dekker wrote: This patch fixes the bug that, given no positional parameters, unquoted $@ and $* incorrectly generate one empty field (they should generate no fields). Apparently that was a side effect of the above. This seems weird though. If you want to remove the recording of empty regions because they are pointless, then how does removing them fix a bug? Doesn't this show that empty regions do have an effect? Perhaps they're not supposed to have any effect, perhaps it's a specific combination of empty regions and something else that triggers some bug, and perhaps that combination can no longer occur with your patch. The latter is my guess, but I haven't had time to investigate it. Looking into it again: When IFS is set to an empty string, sepc is set to '\0' in varvalue(). This then causes *quotedp to be set to true, meaning evalvar()'s quoted variable is turned on. quoted is then passed to recordregion() as the nulonly parameter. ifsp->nulonly has a bigger effect than merely selecting whether to use $IFS or whether to only split on null bytes: in ifsbreakup(), nulonly also causes string termination to be suppressed. That's correct: that special treatment is required to preserve empty fields in "$@" expansion. But it should *only* be used when $@ is quoted: ifsbreakup() takes nulonly from the last IFS region, even if it's empty, so having an additional zero-length region with nulonly enabled causes confusion. Passing quoted by value to varvalue() and not attempting to modify it should therefore, and in my quick testing does, also work to fix the original $@ bug. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC/Feature request: UTF-8 support
On 22/03/2018 22:01, Martijn Dekker wrote: Op 22-03-18 om 20:12 schreef Harald van Dijk: Isn't all of busybox, including ash, GPL? Wouldn't that mean that if any busybox code is imported into dash, that from then on dash can only be distributed under GPL? I thought dash was already effectively under the GPL due to including the output of mksignames.c. To be honest, I'm getting confused now. I knew that the output of GPL programs is not generally covered by the GPL itself. And that's all that COPYING says: "mksignames.c: This file is not directly linked with dash. However, its output is." But about the not "generally" covered: as mentioned on <https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#WhatCaseIsOutputGPL>, if the output includes part of the GPL-licensed code, then the output is GPL-licensed as well. And admittedly, signames.c does contain part of mksignames.c. It seems unclear to packagers, at any rate: Fedora and Gentoo say dash is BSD-licensed. MacPorts lists it as GPL-2+. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] don't record empty IFS scan regions
On 22/03/2018 03:40, Martijn Dekker wrote: evalvar() records empty expansion results (varlen == 0) as string regions that need to be scanned for IFS characters. This is pointless, because there is nothing to split. varlen may be set to -1 when an unset variable is expanded. If it's beneficial to avoid recording empty regions to scan for IFS characters, should those also be excluded? This patch fixes the bug that, given no positional parameters, unquoted $@ and $* incorrectly generate one empty field (they should generate no fields). Apparently that was a side effect of the above. This seems weird though. If you want to remove the recording of empty regions because they are pointless, then how does removing them fix a bug? Doesn't this show that empty regions do have an effect? Perhaps they're not supposed to have any effect, perhaps it's a specific combination of empty regions and something else that triggers some bug, and perhaps that combination can no longer occur with your patch. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC/Feature request: UTF-8 support
On 22/03/2018 16:36, Martijn Dekker wrote: Busybox ash [...] The licenses are bidirectionally compatible. Isn't all of busybox, including ash, GPL? Wouldn't that mean that if any busybox code is imported into dash, that from then on dash can only be distributed under GPL? I have a local patch set of stuff that I haven't considered in an appropriate state to send to the list. One of the features I've got working is multibyte locale support. It doesn't assume UTF-8 (only assumes stateless encodings) and is tolerant of malformed strings. I've got it working in pattern matching, in glob() sorting, in trimming (#, ##, % and %%), in field splitting, and in "$*" joining. I'll see if I can polish it a bit and make it available when I've got some more free time again, but I'm not sure how soon that will be. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Expansion-lookalikes in heredoc delimiters
On 15/03/2018 18:11, Herbert Xu wrote: On Thu, Mar 15, 2018 at 05:29:27PM +0100, Harald van Dijk wrote: That's because POSIX specifies that after ${, everything up to the matching }, not including nested strings, expansions, etc., is part of the word. No exception is made when it spans multiple lines. Another instance of this is if false; then echo ${ fi } fi echo ok This is accepted by bash, ksh, zsh and posh. Interestingly, ksh bombs out on this: echo ${ fi } So this behaviour is not exactly consistent. It's perfectly consistent. It gets accepted at parse time, it only gets rejected at expansion time. That's how dash generally behaves as well: $ dash -c 'echo ${x^}' dash: 1: Bad substitution $ dash -c ': || echo ${x^}' $ Historically, as I understand it, ash would reject both of these, but you specifically modified dash to accept invalid expansions during parsing (which makes sense): <https://git.kernel.org/pub/scm/utils/dash/dash.git/commit/?id=3df3edd13389ae768010bfacee5612346b413e38> In any case, this substituion is invalid in all of these shells so does it really matter? Okay, it can be trivially modified to something that does work in other shells (even if it were actually executed), but gets rejected at parse time by dash: if false; then : ${$+ } fi It'd be nice if all shells used the same parse rules, so that scripts can detect dynamically which expansions are supported, but don't have to go through ugly eval commands to use them. That's because bash doesn't ever match multi-line heredoc delimiters. Which is what POSIX technically requires: "The here-document shall be treated as a single word that begins after the next and continues until there is a line containing only the delimiter and a , with no characters in between." No single line will ever match a multi-line heredoc delimiter. Sure. But this is a quality of implementation issue. If you're never going to match the delimter you should probably emit an error at the very start rather than at the EOF. POSIX isn't clear on whether reaching EOF without seeing the heredoc delimiter is an error and shells disagree. dash silently accepts it, as do ksh, yash and zsh. When EOF is an acceptable way to terminate a heredoc, a delimiter which never matches can be useful to force the complete remainder of the file to be treated as a heredoc. Another interesting thing to try is cat << ${ foo ${ Also you can look at the quotes: cat << " " IOW it's not clear that "word" in this context necessarily follows the same rules used during normal tokenisation. Quotes are clear: as far as they go, this *must* follow the same rules used during normal tokenisation. Does any shell disagree? I was talking about multi-line quotes, which you conveniently dismissed :) I got back to that :) I picked another example first because I thought it would be clearer using that that the normal tokenisation rules should be used. For your example, does any shell, including dash, *not* treat this as a multi-line heredoc delimiter consisting of a newline character (which may or may not be matched by two blank lines, depending on the shell)? Yes, almost every shell fails with the multi-line delimiter inside double quotes. Only dash and ksh93 seem to get it right. posh accepts multi-line heredoc delimiters as well. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Expansion-lookalikes in heredoc delimiters
On 15/03/2018 15:52, Herbert Xu wrote: On Thu, Mar 15, 2018 at 12:41:10PM +0100, Harald van Dijk wrote: It is if you want to do it the way POSIX specifies. You're adding a special exception in the parser. I don't see how this approach can be extended to handle the other examples in my mail: I don't think it's exactly clear what POSIX says on this. I don't think it's clear what POSIX means to say, but I do think it's clear what it does say. cat <<`one two` ok `one two` Try this one: cat << ${ ${ bash/ksh93 both will get stuck forever until a right brace is entered. That's because POSIX specifies that after ${, everything up to the matching }, not including nested strings, expansions, etc., is part of the word. No exception is made when it spans multiple lines. Another instance of this is if false; then echo ${ fi } fi echo ok This is accepted by bash, ksh, zsh and posh. However, in this instance, I'm having trouble finding where, but IIRC, POSIX says or means to say that it's okay to flag this as an invalid parameter expansion at parse time even though the evaluation would otherwise be skipped. While other shells all behave the way dash does. All? At least two others don't: yash rejects ${ as invalid because of the missing parameter name. zsh agrees with bash and ksh that it should look for the closing }. Considering the fact that even if you closed the brace after the newline bash would still be stuck forever, That's because bash doesn't ever match multi-line heredoc delimiters. Which is what POSIX technically requires: "The here-document shall be treated as a single word that begins after the next and continues until there is a line containing only the delimiter and a , with no characters in between." No single line will ever match a multi-line heredoc delimiter. I think this behaviour is suboptimal. ksh93 seems to do the right thing though. Yes, I agree that multi-line heredoc delimiters are useful. Another interesting thing to try is cat << ${ foo ${ Also you can look at the quotes: cat << " " IOW it's not clear that "word" in this context necessarily follows the same rules used during normal tokenisation. Quotes are clear: as far as they go, this *must* follow the same rules used during normal tokenisation. Does any shell disagree? cat << "A ' B" ok 1 A ' B echo ok 2 I expect this to print ok 1 ok 2 and I'm not finding any shell that disagrees, not even dash. Do you have an example of a shell that sees only "A as the heredoc delimiter, or one that perhaps performs quote removal on the inner '? For your example, does any shell, including dash, *not* treat this as a multi-line heredoc delimiter consisting of a newline character (which may or may not be matched by two blank lines, depending on the shell)? Anyway, dash has had CHKEOFMARK since 2007 so this is just an extension of existing behaviour. This is definitely true: your patch, regardless of whether it matches POSIX's requirements, is consistent with how dash has behaved for a long time. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 3/8/18 1:40 AM, Harald van Dijk wrote: If the syntax stack is to be stored on the actual stack, then real recursion could be used instead, as attached. Even though it won't be accepted in dash, I continued with this approach for my own use. I've now got it to about 1800 bytes smaller (at -Os -s). After the other changes I'd done, it became apparent to me that the syntax tables were unnecessary, and that they'd become fairly easy to get rid of. This was a big space saver that may be possible to apply to your version as well. Cheers, Harald van Dijk diff --git a/src/expand.c b/src/expand.c index 2a50830..acd5fdf 100644 --- a/src/expand.c +++ b/src/expand.c @@ -83,7 +83,7 @@ #define RMESCAPE_HEAP 0x10 /* Malloc strings instead of stalloc */ /* Add CTLESC when necessary. */ -#define QUOTES_ESC (EXP_FULL | EXP_CASE | EXP_QPAT) +#define QUOTES_ESC (EXP_FULL | EXP_CASE) /* Do not skip NUL characters. */ #define QUOTES_KEEPNUL EXP_TILDE @@ -115,8 +115,8 @@ STATIC char *exptilde(char *, char *, int); STATIC void expbackq(union node *, int); STATIC const char *subevalvar(char *, char *, int, int, int, int, int); STATIC char *evalvar(char *, int); -STATIC size_t strtodest(const char *, const char *, int); -STATIC void memtodest(const char *, size_t, const char *, int); +STATIC size_t strtodest(const char *, int); +STATIC void memtodest(const char *, size_t, int); STATIC ssize_t varvalue(char *, int, int, int *); STATIC void expandmeta(struct strlist *, int); #ifdef HAVE_GLOB @@ -333,16 +333,6 @@ addquote: case CTLESC: startloc++; length++; - - /* - * Quoted parameter expansion pattern: remove quote - * unless inside inner quotes or we have a literal - * backslash. - */ - if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) == - EXP_QPAT && *p != '\\') -break; - goto addquote; case CTLVAR: p = evalvar(p, flag | inquotes); @@ -396,7 +386,7 @@ done: if (!home || !*home) goto lose; *p = c; - strtodest(home, SQSYNTAX, quotes); + strtodest(home, quotes | EXP_QUOTED); return (p); lose: *p = c; @@ -521,7 +511,6 @@ expbackq(union node *cmd, int flag) char *p; char *dest; int startloc; - char const *syntax = flag & EXP_QUOTED ? DQSYNTAX : BASESYNTAX; struct stackmark smark; INTOFF; @@ -535,7 +524,7 @@ expbackq(union node *cmd, int flag) if (i == 0) goto read; for (;;) { - memtodest(p, i, syntax, flag & QUOTES_ESC); + memtodest(p, i, flag & (QUOTES_ESC | EXP_QUOTED)); read: if (in.fd < 0) break; @@ -651,8 +640,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla char *(*scan)(char *, char *, char *, char *, int , int); argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ? - (flag & (EXP_QUOTED | EXP_QPAT) ? - EXP_QPAT : EXP_CASE) : 0)); + EXP_CASE : 0)); STPUTC('\0', expdest); argbackq = saveargbackq; startp = stackblock() + startloc; @@ -844,7 +832,7 @@ end: */ STATIC void -memtodest(const char *p, size_t len, const char *syntax, int quotes) { +memtodest(const char *p, size_t len, int quotes) { char *q; if (unlikely(!len)) @@ -855,11 +843,17 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) { do { int c = (signed char)*p++; if (c) { - if ((quotes & QUOTES_ESC) && - ((syntax[c] == CCTL) || - (((quotes & EXP_FULL) || syntax != BASESYNTAX) && - syntax[c] == CBACK))) -USTPUTC(CTLESC, q); + if (quotes & QUOTES_ESC) { +switch (c) { + case '\\': + case '!': case '*': case '?': case '[': case '=': + case '~': case ':': case '/': case '-': case ']': + if (quotes & EXP_QUOTED) + case CTLVARS: + USTPUTC(CTLESC, q); + break; +} + } } else if (!(quotes & QUOTES_KEEPNUL)) continue; USTPUTC(c, q); @@ -870,13 +864,10 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) { STATIC size_t -strtodest(p, syntax, quotes) - const char *p; - const char *syntax; - int quotes; +strtodest(const char *p, int quotes) { size_t len = strlen(p); - memtodest(p, len, syntax, quotes); + memtodest(p, len, quotes); return len; } @@ -895,15 +886,13 @@ varvalue(char *name, int varflags, int flags, int *quotedp) int sep; char sepc; char **ap; - char const *syntax; int quoted = *quotedp; int subtype = varflags & VSTYPE; int discard = subtype == VSPLUS || subtype == VSLENGTH; - int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL; + int quotes = quoted | (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL; ssize_t len = 0; sep = (flags & EXP_FULL) << CHAR_BIT; - syntax = quoted ? DQSYNTAX : BASESYNTAX; switch (*name) { case '$': @@ -946,11 +935,11 @@ param: if (!(ap = shellparam.p)) return -1; while ((p = *ap++)) { - len += strtodest(p, syntax, quotes); + le
Re: [PATCH] parser: Fix single-quoted patterns in here-documents
On 3/9/18 4:07 PM, Herbert Xu wrote: On Thu, Mar 08, 2018 at 07:35:53PM +0100, Harald van Dijk wrote: Related: x=*; cat < I don't think this is related to our patches at all. Not related to our patches, but related to the original bug. It's another instance where quoted * is wrongly treated as unquoted. Your fix looks good to me. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Expansion-lookalikes in heredoc delimiters
Consider this: cat <<`bad` `bad` As far as I can tell, this is technically valid, supposed to print nothing, and accepted in most other shells. POSIX Token Recognition says `bad` is to be recognised as a single token of type word, and any word can act as a heredoc delimiter. POSIX Here-Document says only quote removal is performed on the word. However, dash does not always preserve the original spelling of the word. That's what's going on here. Because dash hasn't preserved the `bad` spelling (it's been turned into CTLBACKQ), the check for the end of the heredoc doesn't pick it up, and it instead gets taken as a command substitution. When an actual 0x84 byte is seen in the input, *that* gets taken as the heredoc delimiter: dash -c "tail -n1 <<\`:\` `printf \\\204` ok \`:\`" In a locale in which 0x84 is a valid character (since dash doesn't support locales, that's easy, it's always valid), it's supposed to print "ok". dash instead interprets the second line as the end of the heredoc and subsequently issues an error message when it interprets "ok" on line 3 as a command to execute. This is pretty clearly a case that no serious script is ever going to encounter, not to mention one that many shells don't even attempt to support, at least not completely, so I don't think this is a real problem. I'm mentioning it anyway because I was trying to come up with a few more test cases for the parser, and I think it's good to have a record not only of what worked, what has been made to work, and what got broken, but also of what's never going to be work. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 08/03/2018 08:55, Herbert Xu wrote: On Thu, Mar 08, 2018 at 01:40:32AM +0100, Harald van Dijk wrote: If the syntax stack is to be stored on the actual stack, then real recursion could be used instead, as attached. I tried to avoid recursion for the cases that unpatched dash already handled properly. It does look a lot simpler than I expected. However, this patch is still 30% bigger than my patch :) That's a bit unfair: that's mostly due to moved code, not to changed code. But size-wise, your patch still wins: parser.o does become larger with my patch than with yours, largely due to my attempts to only recurse when needed. As real recursion doesn't seem to buy us much I think I'll stick with the syntax stack for now. @@ -941,20 +1042,25 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) c != '\\' && c != '`' && c != '$' && ( c != '"' || - eofmark != NULL + (eofmark != NULL && !varnest) + ) && ( + c != '}' || + !varnest || + (dqvarnest ? innerdq : dblquote) So this seems to be the only substantial difference between your patch and mine. I have looked at the behaviour of other shells and I think I will stick with my patch's behaviour for now. The first line of this part of my patch is about something else: x=\"; cat < In general, if there are disagreements between shells and the standard is not specific enough, I'll pick the approach that results in simpler code. Fair enough. Cheers, Harald van Dijk Thanks, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash tested against ash testsuite: 17 failures
On 3/8/18 7:30 AM, Herbert Xu wrote: Could you please resend these patches as two separate emails please? Patchwork cannot handle two patches in one email: https://patchwork.kernel.org/patch/10264661/ Ah, didn't realise that. I'll keep that in mind for future mails. Actually, I'll withdraw my second patch for now, I spotted another problem in alias handling (already present in 0.5.9.1) and want to double-check that my patch didn't make things worse. Also it would be nice if you could include the patch descriptions in each email as these will go into the git tree. parser: use pgetc_eatbnl() in more places. dash has a pgetc_eatbnl function in parser.c which skips any backslash-newline combinations. It's not used everywhere it could be. There is also some duplicated backslash-newline handling elsewhere in parser.c. Replace most of the calls to pgetc() with calls to pgetc_eatbnl() and remove the duplicated backslash-newline handling. diff --git a/src/parser.c b/src/parser.c index 382658e..8b945e3 100644 --- a/src/parser.c +++ b/src/parser.c @@ -106,6 +106,7 @@ STATIC void parseheredoc(void); STATIC int peektoken(void); STATIC int readtoken(void); STATIC int xxreadtoken(void); +STATIC int pgetc_eatbnl(); STATIC int readtoken1(int, char const *, char *, int); STATIC void synexpect(int) __attribute__((__noreturn__)); STATIC void synerror(const char *) __attribute__((__noreturn__)); @@ -656,8 +657,10 @@ parseheredoc(void) if (needprompt) { setprompt(2); } - readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX, -here->eofmark, here->striptabs); + if (here->here->type == NHERE) + readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs); + else + readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs); n = (union node *)stalloc(sizeof (struct narg)); n->narg.type = NARG; n->narg.next = NULL; @@ -782,7 +785,7 @@ xxreadtoken(void) setprompt(2); } for (;;) { /* until token or start of word found */ - c = pgetc(); + c = pgetc_eatbnl(); switch (c) { case ' ': case '\t': case PEOA: @@ -791,30 +794,23 @@ xxreadtoken(void) while ((c = pgetc()) != '\n' && c != PEOF); pungetc(); continue; - case '\\': - if (pgetc() == '\n') { -nlprompt(); -continue; - } - pungetc(); - goto breakloop; case '\n': nlnoprompt(); RETURN(TNL); case PEOF: RETURN(TEOF); case '&': - if (pgetc() == '&') + if (pgetc_eatbnl() == '&') RETURN(TAND); pungetc(); RETURN(TBACKGND); case '|': - if (pgetc() == '|') + if (pgetc_eatbnl() == '|') RETURN(TOR); pungetc(); RETURN(TPIPE); case ';': - if (pgetc() == ';') + if (pgetc_eatbnl() == ';') RETURN(TENDCASE); pungetc(); RETURN(TSEMI); @@ -822,11 +818,9 @@ xxreadtoken(void) RETURN(TLP); case ')': RETURN(TRP); - default: - goto breakloop; } + break; } -breakloop: return readtoken1(c, BASESYNTAX, (char *)NULL, 0); #undef RETURN } @@ -836,7 +830,7 @@ static int pgetc_eatbnl(void) int c; while ((c = pgetc()) == '\\') { - if (pgetc() != '\n') { + if (pgetc2() != '\n') { pungetc(); break; } @@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) attyline(); if (syntax == BASESYNTAX) return readtoken(); - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); goto loop; } #endif @@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) goto endword; /* exit outer loop */ USTPUTC(c, out); nlprompt(); -c = pgetc(); +c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); goto loop; /* continue outer loop */ case CWORD: USTPUTC(c, out); @@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) USTPUTC(CTLESC, out); USTPUTC('\\', out); pungetc(); -} else if (c == '\n') { - nlprompt(); } else { if ( dblquote && @@ -997,7 +989,7 @@ quotemark: USTPUTC(c, out); --parenlevel; } else { - if (pgetc() == ')') { + if (pgetc_eatbnl() == ')') { USTPUTC(CTLENDARI, out); if (!--arinest) syntax = prevsyntax; @@ -1025,7 +1017,7 @@ quotemark: USTPUTC(c, out); } } - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); } } endword: @@ -1132,7 +1124,7 @@ parseredir: { np = (union node *)stalloc(sizeof (struct nfile)); if (c == '>') { np->nfile.fd = 1; - c = pgetc(); + c = pgetc_eatbnl(); if (c == '>') np->type = NAPPEND; else if (c == '|') @@ -1145,7 +1137,7 @@ parseredir: { } } else { /* c == '<' */ np->nfile.fd = 0; - switch (c = pgetc()) { + switch (c = pgetc_eatbnl()) { case '<': if (sizeof (struct nfile) != sizeof (struct nhere)) { np = (union node *)stalloc(sizeof (struct nhere)); @@ -1154,7 +1146,7 @@ parseredir: { np->type = NHERE;
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 3/7/18 8:04 PM, Harald van Dijk wrote: On 3/7/18 5:29 PM, Herbert Xu wrote: On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote: Another pre-existing dash parsing bug that I encountered now is $(( ( $(( 1 )) ) )). This should expand to 1, but gives a hard error in dash, again due to the non-recursive nature of dash's parser. A small generalisation of what I've been doing so far could handle this, so it makes sense to me to try to achieve this at the same time. Thanks for the patches. You have convinced that a stacked syntax is indeed necessary. However, I'd like to do the reversion of the run-time quote book-keeping patch in a separate step. I also didn't quite like the idea of scanning the string backwards to find the previous syntax. So here is my attempt at the recursive parsing using alloca. If the syntax stack is to be stored on the actual stack, then real recursion could be used instead, as attached. I tried to avoid recursion for the cases that unpatched dash already handled properly. It's not completely recursive in that I've maintained the existing behaviour of double-quotes inside parameter expansion inside double-quotes. However, it does seem to address most of the issues that you have raised. One thing I found it doesn't handle, although it does look like you try to support it, is ${$-"}"}, which should expand to the same thing as $$. This doesn't work because the first " is CDQUOTE, not CENDQUOTE, so synstack->innerdq doesn't get set, and there's nothing else that prevents the quoted } from being treated as the end of the variable substitution. To handle that it can just look at dblquote instead, included in the attached. In this patch, I managed let "${$+\}}" expand to }, but "${$+"\}"}" to \}, for the reasons I gave earlier. Martijn Dekker's test case of 'x=0; x=$((${x}+1))' also works with this. Cheers, Harald van Dijk diff --git a/src/expand.c b/src/expand.c index 2a50830..903e250 100644 --- a/src/expand.c +++ b/src/expand.c @@ -83,7 +83,7 @@ #define RMESCAPE_HEAP 0x10 /* Malloc strings instead of stalloc */ /* Add CTLESC when necessary. */ -#define QUOTES_ESC (EXP_FULL | EXP_CASE | EXP_QPAT) +#define QUOTES_ESC (EXP_FULL | EXP_CASE) /* Do not skip NUL characters. */ #define QUOTES_KEEPNUL EXP_TILDE @@ -333,16 +333,6 @@ addquote: case CTLESC: startloc++; length++; - - /* - * Quoted parameter expansion pattern: remove quote - * unless inside inner quotes or we have a literal - * backslash. - */ - if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) == - EXP_QPAT && *p != '\\') -break; - goto addquote; case CTLVAR: p = evalvar(p, flag | inquotes); @@ -651,8 +641,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla char *(*scan)(char *, char *, char *, char *, int , int); argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ? - (flag & (EXP_QUOTED | EXP_QPAT) ? - EXP_QPAT : EXP_CASE) : 0)); + EXP_CASE : 0)); STPUTC('\0', expdest); argbackq = saveargbackq; startp = stackblock() + startloc; @@ -1644,7 +1633,6 @@ char * _rmescapes(char *str, int flag) { char *p, *q, *r; - unsigned inquotes; int notescaped; int globbing; @@ -1674,24 +1662,23 @@ _rmescapes(char *str, int flag) q = mempcpy(q, str, len); } } - inquotes = 0; globbing = flag & RMESCAPE_GLOB; notescaped = globbing; while (*p) { if (*p == (char)CTLQUOTEMARK) { - inquotes = ~inquotes; p++; notescaped = globbing; continue; } + if (*p == '\\') { + /* naked back slash */ + notescaped = 0; + goto copy; + } if (*p == (char)CTLESC) { p++; if (notescaped) *q++ = '\\'; - } else if (*p == '\\' && !inquotes) { - /* naked back slash */ - notescaped = 0; - goto copy; } notescaped = globbing; copy: diff --git a/src/expand.h b/src/expand.h index 26dc5b4..90f5328 100644 --- a/src/expand.h +++ b/src/expand.h @@ -55,7 +55,6 @@ struct arglist { #define EXP_VARTILDE 0x4 /* expand tildes in an assignment */ #define EXP_REDIR 0x8 /* file glob for a redirection (1 match only) */ #define EXP_CASE 0x10 /* keeps quotes around for CASE pattern */ -#define EXP_QPAT 0x20 /* pattern in quoted parameter expansion */ #define EXP_VARTILDE2 0x40 /* expand tildes after colons only */ #define EXP_WORD 0x80 /* expand word in parameter expansion */ #define EXP_QUOTED 0x100 /* expand word in double quotes */ diff --git a/src/parser.c b/src/parser.c index 382658e..17d60b0 100644 --- a/src/parser.c +++ b/src/parser.c @@ -827,7 +827,8 @@ xxreadtoken(void) } } breakloop: - return readtoken1(c, BASESYNTAX, (char *)NULL, 0); + readtoken1(c, BASESYNTAX, (char *)NULL, 0); + return lasttoken; #undef RETURN } @@ -855,47 +856,147 @@ static int pgetc_eatbnl(void) * word which marks the end of the doc
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 3/7/18 5:29 PM, Herbert Xu wrote: On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote: Another pre-existing dash parsing bug that I encountered now is $(( ( $(( 1 )) ) )). This should expand to 1, but gives a hard error in dash, again due to the non-recursive nature of dash's parser. A small generalisation of what I've been doing so far could handle this, so it makes sense to me to try to achieve this at the same time. Thanks for the patches. You have convinced that a stacked syntax is indeed necessary. However, I'd like to do the reversion of the run-time quote book-keeping patch in a separate step. I also didn't quite like the idea of scanning the string backwards to find the previous syntax. So here is my attempt at the recursive parsing using alloca. Nice approach. It's not completely recursive in that I've maintained the existing behaviour of double-quotes inside parameter expansion inside double-quotes. However, it does seem to address most of the issues that you have raised. One thing I found it doesn't handle, although it does look like you try to support it, is ${$-"}"}, which should expand to the same thing as $$. This doesn't work because the first " is CDQUOTE, not CENDQUOTE, so synstack->innerdq doesn't get set, and there's nothing else that prevents the quoted } from being treated as the end of the variable substitution. As I have not reverted the run-time quote book-keeping, the handling of $@/$* remains unchanged. I'll try to re-do those fixes based on the approach you're going for here. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash tested against ash testsuite: 17 failures
On 3/7/18 7:51 AM, Herbert Xu wrote: On Wed, Mar 07, 2018 at 07:49:16AM +0100, Harald van Dijk wrote: This was wrong in the original patch, but I'm not seeing it in the updated patch that you replied to. When parsing a heredoc where part of delimiter is quoted, syntax==SQSYNTAX. Since the calls to pgetc_eatbnl() are conditional on syntax!=SQSYNTAX, there shouldn't be a problem. It would be a different story if the delimiter could be an unquoted backslash, but thankfully that is not possible. Good point. In that case please resend it with the pgetc2 change and it should be good to go. Here you go. The problem with dash -c 'alias x= x' and dash -c 'alias bs=\\ bs ' looks like it just needs one extra line, so also attached as a separate patch. Cheers, Harald van Dijk diff --git a/src/parser.c b/src/parser.c index 382658e..8b945e3 100644 --- a/src/parser.c +++ b/src/parser.c @@ -106,6 +106,7 @@ STATIC void parseheredoc(void); STATIC int peektoken(void); STATIC int readtoken(void); STATIC int xxreadtoken(void); +STATIC int pgetc_eatbnl(); STATIC int readtoken1(int, char const *, char *, int); STATIC void synexpect(int) __attribute__((__noreturn__)); STATIC void synerror(const char *) __attribute__((__noreturn__)); @@ -656,8 +657,10 @@ parseheredoc(void) if (needprompt) { setprompt(2); } - readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX, -here->eofmark, here->striptabs); + if (here->here->type == NHERE) + readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs); + else + readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs); n = (union node *)stalloc(sizeof (struct narg)); n->narg.type = NARG; n->narg.next = NULL; @@ -782,7 +785,7 @@ xxreadtoken(void) setprompt(2); } for (;;) { /* until token or start of word found */ - c = pgetc(); + c = pgetc_eatbnl(); switch (c) { case ' ': case '\t': case PEOA: @@ -791,30 +794,23 @@ xxreadtoken(void) while ((c = pgetc()) != '\n' && c != PEOF); pungetc(); continue; - case '\\': - if (pgetc() == '\n') { -nlprompt(); -continue; - } - pungetc(); - goto breakloop; case '\n': nlnoprompt(); RETURN(TNL); case PEOF: RETURN(TEOF); case '&': - if (pgetc() == '&') + if (pgetc_eatbnl() == '&') RETURN(TAND); pungetc(); RETURN(TBACKGND); case '|': - if (pgetc() == '|') + if (pgetc_eatbnl() == '|') RETURN(TOR); pungetc(); RETURN(TPIPE); case ';': - if (pgetc() == ';') + if (pgetc_eatbnl() == ';') RETURN(TENDCASE); pungetc(); RETURN(TSEMI); @@ -822,11 +818,9 @@ xxreadtoken(void) RETURN(TLP); case ')': RETURN(TRP); - default: - goto breakloop; } + break; } -breakloop: return readtoken1(c, BASESYNTAX, (char *)NULL, 0); #undef RETURN } @@ -836,7 +830,7 @@ static int pgetc_eatbnl(void) int c; while ((c = pgetc()) == '\\') { - if (pgetc() != '\n') { + if (pgetc2() != '\n') { pungetc(); break; } @@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) attyline(); if (syntax == BASESYNTAX) return readtoken(); - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); goto loop; } #endif @@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) goto endword; /* exit outer loop */ USTPUTC(c, out); nlprompt(); -c = pgetc(); +c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); goto loop; /* continue outer loop */ case CWORD: USTPUTC(c, out); @@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) USTPUTC(CTLESC, out); USTPUTC('\\', out); pungetc(); -} else if (c == '\n') { - nlprompt(); } else { if ( dblquote && @@ -997,7 +989,7 @@ quotemark: USTPUTC(c, out); --parenlevel; } else { - if (pgetc() == ')') { + if (pgetc_eatbnl() == ')') { USTPUTC(CTLENDARI, out); if (!--arinest) syntax = prevsyntax; @@ -1025,7 +1017,7 @@ quotemark: USTPUTC(c, out); } } - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); } } endword: @@ -1132,7 +1124,7 @@ parseredir: { np = (union node *)stalloc(sizeof (struct nfile)); if (c == '>') { np->nfile.fd = 1; - c = pgetc(); + c = pgetc_eatbnl(); if (c == '>') np->type = NAPPEND; else if (c == '|') @@ -1145,7 +1137,7 @@ parseredir: { } } else { /* c == '<' */ np->nfile.fd = 0; - switch (c = pgetc()) { + switch (c = pgetc_eatbnl()) { case '<': if (sizeof (struct nfile) != sizeof (struct nhere)) { np = (union node *)stalloc(sizeof (struct nhere)); @@ -1154,7 +1146,7 @@ parseredir: { np->type = NHERE; heredoc = (struct heredoc *)stalloc(sizeof (struct here
Re: dash tested against ash testsuite: 17 failures
On 3/6/18 9:45 AM, Herbert Xu wrote: On Wed, Oct 12, 2016 at 07:24:26PM +0200, Harald van Dijk wrote: I would have expected another exception to be in alias expansions that end in a backslash. Shells are not entirely in agreement there, but most appear to treat this the regular way, meaning dash -c 'alias bs=\\ bs ' prints nothing. I think your patch changes this. In order to preserve the existing behaviour (which seems logical), you should change the second pgetc call in pgetc_eatbnl to pgetc2. Oh, indeed, thanks. There's another problem: when there is no following command (as in the above example), things break. A shorter reproducer that has failed for years is $ dash -c 'alias x= x' dash: 2: Syntax error: end of file unexpected This breaks because the part where list() checks for NL/EOF, checkkwd==0, so aliases aren't expanded. Immediately after that, checkkwd is set and the next call to readtoken() would return TEOF, but by that point, dash has already committed to parsing a command. Since this is actually a long-standing problem, not something introduced by the patch, I think it's okay to ignore for now. Do you agree? With more extensive testing, the only issue I've seen is what Jilles Tjoelker had already mentioned, namely that backslash-newline should be preserved inside single-quoted strings, and also that it should be preserved inside heredocs where any part of the delimiter is quoted: cat <<\EOF \ EOF dash's parsing treats this mostly the same as a single-quoted string, and the same extra check handles both cases. Here's an updated patch. Hoping this looks okay and can be applied. I'm fine with the concept. However, your patch also breaks here- document parsing when the delimiter is a single backslash. cat << "\" \ > If you can fix these two problems it should be good to go. As Martijn Dekker wrote, this should work when the backslash is escaped or single-quoted, and in my testing does. But what you have is a nice start of another corner case: cat << "\" \ "EOF ok " EOF I'm happily surprised to see that dash accepts and gives sensible treatment to multi-line heredoc delimiters. Okay with your one extra pgetc()=>pgetc2() change, then? Cheers, Harald van Dijk Cheers, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix "Illegal number" on FreeBSD & macOS for x=; echo $((x))
On 3/6/18 2:15 AM, Martijn Dekker wrote: Op 06-03-18 om 00:23 schreef Harald van Dijk: On 3/6/18 12:23 AM, Martijn Dekker wrote: "All variables shall be initialized to zero if they are not otherwise assigned by the input to the application." In your example, x has been assigned. Its value is empty, but empty and unset are two different things. But I see now that dash gives the same error when x is unset. That part is definitely wrong for exactly the reason you point out. By default, all unset variables are considered empty, so they should act the same. Except where special behaviour is specified for unset variables, which I believed to be the case here. But see below. That brings me to another possible issue, though: $ dash -uc 'unset x; echo $((x))' 0 Shouldn't that give an "parameter not set" error under set -u? Interestingly, only bash and AT ksh93 currently do that. This is <http://austingroupbugs.net/view.php?id=559>. It's unspecified for now, but will likely become an error in the future. Also, consensus among existing shells appears to be universal. Mostly, but not fully. yash is the exception: $ yash -c 'unset x; echo $((x))' 0 $ yash -c 'x=; echo $((x))' But in POSIX mode it acts like other shells: $ yash -o posix -c 'x=; echo $((x))' 0 I stand corrected! Indeed, then it does appear that all shells are in agreement. And when POSIX is (arguably) unclear but all shells are in agreement, the shells' behaviour should generally just be taken as the correct behaviour. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 3/4/18 9:08 PM, Martijn Dekker wrote: Op 04-03-18 om 16:46 schreef Harald van Dijk: FreeBSD sh also prints a blank line here. [...] Like above, FreeBSD sh behaves like ksh. I stand corrected. Is there any port of FreeBSD sh to other operating systems? It would be much more convenient for me to include it in my tests if I didn't have to launch a FreeBSD VM and rsync & run the test scripts separately. None that I know of. Running the test script over ssh might be slightly less difficult, but nothing as easy as a port. The source code contains several very much FreeBSD-specific bits. Yes, the inconsistency should be fixed. Either it should be treated as quoted or as unquoted, but not quoted-unless-it-comes-from-a-variable. I have no strong feelings on which it should be. Neither do I, so I would default to the behaviour that both pre-exists in dash and corresponds with the majority of other shells. I went for the behaviour that required the fewest changes for now, which is to treat them as unquoted. If it is agreed that it should be quoted, it requires some additional (minor) complications in the parser, because the existing state would no longer be sufficient to determine whether } should end the substitution. But yes, I agree that given how long dash has treated this as quoted, it makes sense to keep that, unless there's a compelling reason not to. [...] $ src/dash -c 'printf "%s\n" "${$+\}}"' \} Expected output: } (no backslash), as in bash 4, yash, ksh93, pdksh, mksh, zsh. In other words: it should be possible to escape a '}' with a backslash within the parameter expansion, even if the expansion is quoted. POSIX ref.: 2.6.2 Parameter Expansion http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02 | Any '}' escaped by a or within a quoted string, and | characters in embedded arithmetic expansions, command substitutions, | and variable expansions, shall not be examined in determining the | matching '}'. I believe this actually requires dash's behaviour. This says the first } isn't examined in determining the matching '}', but only that: it just says the parameter expansion expression is $+\}. It doesn't say the backslash is removed. I believe the word "escaped" implies that removal. If a '}' is escaped by a backslash, it's implied that the backslash is removed as this escaping is parsed, just as it's implied that quotes are removed from a quoted string. That's not implied, that's stated: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_07 The quote characters ( , single-quote, and double-quote) that were present in the original word shall be removed unless they have themselves been quoted. In this case, the backslash was quoted, so this doesn't apply. I agree that it would be much better to print } here though. All other current shells except bosh (schilytools sh) agree, too -- even FreeBSD sh, and I checked it this time. Shells agree on the simple cases to remove the backslash: ${x+\}} "${x+\}}" <In ${x+"\}"}, most shells keep the backslash. ksh and FreeBSD sh remove it. I think it makes most sense to keep it, because the general rule for \ in double-quoted strings is that it's only removed if the following character would have been special. This patch removes it. In "${x+"\}"}", of the shells that treat the \} as quoted, most shells keep the backslash. bash removes it. I think it makes most sense to keep the backslash for the same reason as above. In <zsh keep it, and FreeBSD sh treats the " as a literal. Again I think it makes most sense to keep the backslash (but remove the "). This patch removes it. Another pre-existing dash parsing bug that I encountered now is $(( ( $(( 1 )) ) )). This should expand to 1, but gives a hard error in dash, again due to the non-recursive nature of dash's parser. A small generalisation of what I've been doing so far could handle this, so it makes sense to me to try to achieve this at the same time. Cheers, Harald van Dijk diff --git a/src/Makefile.am b/src/Makefile.am index 139355e..525f8ef 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -66,7 +66,7 @@ syntax.c syntax.h: mksyntax signames.c: mksignames ./$^ -mksyntax: token.h +mksyntax: parser.h token.h $(HELPERS): %: %.c $(COMPILE_FOR_BUILD) -o $@ $< diff --git a/src/TOUR b/src/TOUR index 056e79b..f6a4641 100644 --- a/src/TOUR +++ b/src/TOUR @@ -150,6 +150,7 @@ special codes defined in parser.h. The special codes are: CTLVAR Variable substitution CTLENDVAR End of variable substitution CTLBACKQCommand substitution +CTLBACKQ|CTLQUOTE Command substitution inside double quotes CTLESC Escape next character A variable substitution contains the following elements: @@ -
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 3/4/18 7:05 PM, Denys Vlasenko wrote: On Fri, Mar 2, 2018 at 7:03 PM, Harald van Dijk <har...@gigawatt.nl> wrote: On 02/03/2018 18:00, Denys Vlasenko wrote: On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk <har...@gigawatt.nl> wrote: Currently: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' <> This is what I expect, and also what bash, ksh and posh do. With your patch: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' I was looking into this specific example and I believe it is a _bash_ bug. The [a\]] is misinterpreted by it (and probably by many people). The gist is: \] is not a valid escape for ] in set glob expression. Glob sets have no escaping at all, ] can be in a set if it is the first char: []abc], dash can be in a set if it is first or last: [abc-], [ and \ need no protections at all: [a[b\c] is a valid set of 5 chars. Therefore, "[a\]]" glob pattern means "a or \, then ]". Since that does not match "a", the result of ${foo#[a\]]}> should be "a". Are you sure about this? "Patterns Matching a Single Character"'s first paragraph contains "A character shall escape the following character. The escaping shall be discarded." The shell does this first. I have problems with "The shell does this first" statement. It's useful to view the entire discussion of glob pattern matching as a discussion of how fnmatch(pattern, string, flags) should behave (even if a particular shell implementation chose to not use C library's fnmatch() to implement its globbing). Okay. dash has an option to use fnmatch() for pattern matching. But POSIX says fnmatch() performs pattern matching the way the shell does, not the other way around. And there are a few differences that follow from it, because character quote status cannot be represented the same way in a C string as in the description of the shell. Otherwise (IOW: if you allow gobbing to depend on shell's quoting), rules for globbing for different applications will not be consistent. Which would be bad. That's already the case, and cannot be helped. The shell pattern "[a]"[a] is supposed to match a file named "[a]a", and does. But if that file exists, and I run find . -name '"[a]"[a]', I won't get any results. I could use find . -name '\[a\][a]' for that though. And dash, if built with fnmatch() support, translates it to that in order to pass it to fnmatch(). This matches what POSIX says: POSIX doesn't make the removal of " part of pattern matching, and find -name and fnmatch() *only* implement the pattern matching part of the shell. As I see it, shell should massage input according to shell rules (quote/bkslash removal et al), then use fnmatch() or glob(), or its own internal implementations of them. bash seems to not do it. It probably has a "combined" routine which does both in one step, which allows quote removal to interfere with globbing. Here's the proof: $ x='a]'; echo _${x#[a\]]}_ _]_ In the above code, what pattern should be fed to fnmatch(), assuming shell uses fnmatch() to implement ${x#pattern}? Pattern should be "[a]]" because by shell rules "\]" in an unquoted string is "]". By the normal shell rules, there are two places backslashes get removed. The first is during pattern matching. Not before, during. If fnmatch() is used to implement shell pathname expansion and other pattern matching, the string to pass here is literally [a\]]. Just like how for \*, the string to pass is literally \*, and it would be horribly wrong to pass * here as if it were unquoted. fnmatch() will remove the backslashes and take the following character literally. The second place where the shell removes backslashes is during quote removal. But this takes place after pathname expansion. But try this: $ x='a]'; echo _${x#[a]]}_ __ Here, pattern should be "[a]]" as well - it literally is. Here, the pattern should indeed be [a]]. But the results are different! Evidently, bash does _not_ perform quote removal (more precisely, backslash removal) on pattern string. Somehow, globbing code knows \ was there. (And this globbing code, in my opinion, also misinterprets [a\]] as "set of 'a' or ']'", but (a) I might be wrong on this, and (b) this is a bit offtopic, we discuss ${x#pattern} handling here). To me, it looks that bash behavior is buggy regardless of what \] means in glob patterns. These two should be equivalent: x='a]'; echo _${x#[a\]]}_ x='a]'; echo _${x#[a]]}_ because they should use the same pattern for globbing match. See above. Alternative possibility is that pattern in ${x#pattern} is not handled by the usual shell rules: backslashes are not removed. This would be VERY ugly as soon as nested variable expansions are considered. There are and there are supposed to be a few differences in how
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 3/4/18 4:26 PM, Martijn Dekker wrote: Op 04-03-18 om 11:44 schreef Harald van Dijk: When CTLENDVAR is seen, I double-check that syntax has the expected value. This fixes the handling of "${$+"}"}", where the inner } was seen as ending the variable substitution. Also looks like dash with your patch considers the "}" to be quoted: $ src/dash -c 'IFS=}; printf %s\\n "${$+"}"}"' } Only AT ksh93 prints an empty string there, as it doesn't consider the double quotes to be nested, so the "}" is unquoted. Ash produces a syntax error, like dash before this patch. All other shells print a }. So dash with this patch behaves like the majority. FreeBSD sh also prints a blank line here. This changes how "${x+"$y"}" get handled: POSIX is silent about whether the $y should be treated as quoted. dash has treated it as quoted for a very long time. ash has historically treated it as unquoted. With this patch, it gets treated as unquoted. That seems inconsistent with how it handles "${$+"}"}", in which the "}" is treated as quoted (see above). ksh93 is the only existing shell that treats the $y as unquoted, so I think it would be better if dash continued to treat the $y as quoted. Even if not, the inconsistency should be fixed. Like above, FreeBSD sh behaves like ksh. Yes, the inconsistency should be fixed. Either it should be treated as quoted or as unquoted, but not quoted-unless-it-comes-from-a-variable. I have no strong feelings on which it should be. Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how "$@" got handled and reverting that changed it, I looked into how this works and fixed another bug. It also changes the handling of $* and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS properly at all, even if all parameters were non-empty. dash 0.5.9.1 preserves empty parameters. With this patch, they get removed just like in bash. POSIX allows for either. I don't think it does. POSIX implies that empty $@ and $* generates zero fields. 2.5.2 Special Parameters: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02 | @ |Expands to the positional parameters, starting from one, initially |producing one field for each positional parameter that is set. Since there are no positional parameters, no fields should initially be produced at all. Same for $*. So I do believe your patch correctly fixes a bug here. By empty parameters, I meant parameters that had been set to an empty string. It's covered by the 'set -- a ""' in my tests. You're right that after 'set --', unquoted $@ should not produce any fields. I hadn't even noticed that dash got that wrong and that my patch had fixed it. I would be a bit surprised if the patch is acceptable in its current form, but it's worth seeing which of the current results are definitely correct, which of the results are acceptable, which results may well be unwanted, and which special cases I missed. According to my tests (i.e. the modernish test suite), nearly everything is POSIXly correct. There's only one parameter expansion problem left that I can find, and it existed before this patch as well: $ src/dash -c 'printf "%s\n" "${$+\}}"' \} Expected output: } (no backslash), as in bash 4, yash, ksh93, pdksh, mksh, zsh. In other words: it should be possible to escape a '}' with a backslash within the parameter expansion, even if the expansion is quoted. POSIX ref.: 2.6.2 Parameter Expansion http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02 | Any '}' escaped by a or within a quoted string, and | characters in embedded arithmetic expansions, command substitutions, | and variable expansions, shall not be examined in determining the | matching '}'. I believe this actually requires dash's behaviour. This says the first } isn't examined in determining the matching '}', but only that: it just says the parameter expansion expression is $+\}. It doesn't say the backslash is removed. Then, \} is taken as part of a double-quoted string, and inside double-quoted strings, a backslash that isn't followed by $, `, ", \ or a newline is taken as a literal backslash. I agree that it would be much better to print } here though. Thanks for the testing! I'd noticed I had an off-by-one error that causes problems when an expansion ends in another expansion (${x+${x+}}). I'll try to improve it to also handle the issues you've pointed out. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 3/2/18 11:58 AM, Harald van Dijk wrote: On 02/03/2018 08:49, Herbert Xu wrote: If we fix this in the parser then everything should just work. Right, that's the approach FreeBSD sh has taken that I referred to in my message from Feb 18, that I'd personally prefer as well. It basically involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a variable expansion starts, and finding a sensible way to change the syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an explicit stack of syntaxes is created for this, but that might be avoidable: with slight modifications to what gets stored in the byte after CTLVAR/CTLARI, it might be possible to go back through the parser output to determine the syntax to revert to. I'll see if I can get that working. Since I didn't see how to avoid this approach, I went ahead with this attempt and the attached is the result. I started out by reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c. Since that also removed some dead code, I re-removed the read code. I then modified the byte after CTLVAR so that it didn't just store whether the result was quoted, it stored the prior syntax, and forcibly reset syntax to either BASESYNTAX or DQSYNTAX as appropriate. Then, in CTLENDVAR, I look for the opening CTLVAR, and use that to restore the prior syntax. The same goes for CTLARI/CTLENDARI too. When CTLENDVAR is seen, I double-check that syntax has the expected value. This fixes the handling of "${$+"}"}", where the inner } was seen as ending the variable substitution. This fixes more cases than just backslashes and single quotes: another character that's special in unquoted contexts is ~, so "${HOME#~}" should expand to an empty string. This changes how $(( ${$+"123"} )) gets handled: POSIX doesn't really answer this, I think. POSIX says that $(( "123" )) is a syntax error, but doesn't address whether " is special when it appears in other places than directly in the $((...)). Most shells accept $(( ${$+"123"} )), and with this patch, dash accepts it too. This changes how "${x+"$y"}" get handled: POSIX is silent about whether the $y should be treated as quoted. dash has treated it as quoted for a very long time. ash has historically treated it as unquoted. With this patch, it gets treated as unquoted. Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how "$@" got handled and reverting that changed it, I looked into how this works and fixed another bug. It also changes the handling of $* and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS properly at all, even if all parameters were non-empty. dash 0.5.9.1 preserves empty parameters. With this patch, they get removed just like in bash. POSIX allows for either. I would be a bit surprised if the patch is acceptable in its current form, but it's worth seeing which of the current results are definitely correct, which of the results are acceptable, which results may well be unwanted, and which special cases I missed. Cheers, Harald van Dijk command: echo "\*" bash: \* dash 0.5.8: \ \ dash 0.5.9.1: \ \ dash patched: \* command: case \\ab in "\*") echo BUG;; esac bash: dash 0.5.8: BUG dash 0.5.9.1: BUG dash patched: command: case \\a in "\?") echo BUG;; esac bash: dash 0.5.8: BUG dash 0.5.9.1: BUG dash patched: command: foo=\\; echo "<${foo#[\\]]}>" bash: <\> dash 0.5.8: <\> dash 0.5.9.1: <\> dash patched: <\> command: foo=a; echo "<${foo#[a\]]}>" bash: <> dash 0.5.8: <> dash 0.5.9.1: <> dash patched: <> command: x=yz; echo "${x#'y'}" bash: z dash 0.5.8: yz dash 0.5.9.1: yz dash patched: z command: x=yz; echo "${x+'y'}" bash: 'y' dash 0.5.8: 'y' dash 0.5.9.1: 'y' dash patched: 'y' command: x=""; echo "${x#"${x+''}"''}" bash: '' dash 0.5.8: dash 0.5.9.1: dash patched: '' command: HOME=/; echo "${HOME#~}" bash: dash 0.5.8: / dash 0.5.9.1: / dash patched: command: x="13"; echo $(( ${x#'1'} )) bash: 3 dash 0.5.8: 13 dash 0.5.9.1: 13 dash patched: 3 command: echo $(( ${$+"123"} )) bash: 123 dash 0.5.8: dash: 1: arithmetic expression: expecting primary: " "123" " dash 0.5.9.1: dash: 1: arithmetic expression: expecting primary: " "123" " dash patched: 123 command: set -- a ""; space=" "; printf "<%s>" "$@"$space bash: <> dash 0.5.8: < > dash 0.5.9.1: < > dash patched: <> command
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 02/03/2018 17:33, Herbert Xu wrote: On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote: But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}", the ' is a quote character. This part is hard. If the above is done simply using another local variable, then the parse of the nested ${y-} would clobber that variable. I don't see why that's hard. You just need to remember whether you're in a pattern context (i.e., after a %/%% or #/##). If you are, then you need to go back to basesyntax instead of dqsyntax until the next right brace. Let me slightly modify my example so that the effect becomes different: "${x#"${y-*}"''}" When the parser has processed "${x#"${y- would the state be "in a pattern context", or "not in a pattern context"? If the state is "in a pattern context", how do you prevent the next * from being taken as unquoted? It should be taken as quoted. If it stores "not in a pattern context", and the parser is processing the last character in "${x#"${y-*} how can it reset the state to "in a pattern context"? Where could this state be stored? With arbitrarily deep nesting of variable expansions, I do not see how you can avoid requiring arbitrarily large state, which gets difficult with dash's non-recursive parser. Mind you, I do hope I'm missing something obvious here! Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 02/03/2018 18:00, Denys Vlasenko wrote: On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk <har...@gigawatt.nl> wrote: Currently: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' <> This is what I expect, and also what bash, ksh and posh do. With your patch: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' I was looking into this specific example and I believe it is a _bash_ bug. The [a\]] is misinterpreted by it (and probably by many people). The gist is: \] is not a valid escape for ] in set glob expression. Glob sets have no escaping at all, ] can be in a set if it is the first char: []abc], dash can be in a set if it is first or last: [abc-], [ and \ need no protections at all: [a[b\c] is a valid set of 5 chars. Therefore, "[a\]]" glob pattern means "a or \, then ]". Since that does not match "a", the result of ${foo#[a\]]}> should be "a". Are you sure about this? "Patterns Matching a Single Character"'s first paragraph contains "A character shall escape the following character. The escaping shall be discarded." The shell does this first. It removes the backslash (remembering that the following character is escaped) before it starts interpreting the result as a bracket expression, and so never gets to the point where the \ should be taken literally. case \] in [\]]) echo matched ;; esac prints "matched" in all shells I can check. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 02/03/2018 16:28, Herbert Xu wrote: On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote: If we fix this in the parser then everything should just work. Right, that's the approach FreeBSD sh has taken that I referred to in my message from Feb 18, that I'd personally prefer as well. It basically involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a variable expansion starts, and finding a sensible way to change the syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an explicit stack of syntaxes is created for this, but that might be avoidable: with slight modifications to what gets stored in the byte after CTLVAR/CTLARI, it might be possible to go back through the parser output to determine the syntax to revert to. I'll see if I can get that working. Yes but that's overkill just to fix single quote within patterns. We already support nested double-quotes in patterns correctly. As single quotes cannot nest, it should be an easy fix. Single quotes indeed cannot nest, but you do need to reliably determine when a single quote is a special character, which gets very tricky very quickly with nested substitutions. In "${x+''}", the ' is the literal ' character. In "${x#''}", the ' is a quote character. This part is easy, this part is just a matter of setting another variable when the parse of the substitution starts. But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}", the ' is a quote character. This part is hard. If the above is done simply using another local variable, then the parse of the nested ${y-} would clobber that variable. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 02/03/2018 08:49, Herbert Xu wrote: On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote: On 01/03/2018 00:04, Harald van Dijk wrote: $ bash -c 'x=yz; echo "${x#'"'y'"'}"' z $ dash -c 'x=yz; echo "${x#'"'y'"'}"' yz (That is, they are executing x=yz; echo "${x#'y'}".) POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the pattern is considered unquoted regardless of the outer quotation marks. Because of that, the single quote characters should not be taken literally, but should be taken as quoting the y. ksh, posh and zsh agree with bash. Unfortunately, this causes another problem with all of the backslash approaches so far: x=''; printf "%s\n" "${x#''}" This should print a blank line. (bash, ksh, posh and zsh agree.) Here, dash's parser stores '$\$\', where $ is a control character. preglob would need to turn this into . The problem is again that preglob cannot increase the string length. Perhaps the parser needs to store this as '$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it requires some more invasive changes. These are different issues. dash's parser currently does not understand nested quoting in patterns at all. That is, if your parameter expansion are within double quotes, then dash at the parser level will consider the pattern to be double-quoted. Thus any nested single-quotes will be literals instead of actual quotes. That's the same thing though. The problem with the backslashes is also that dash sees them as double-quoted when they should be seen as unquoted, and the approach taken in commit 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c that lasts to this day was specifically to *not* fix this in the parser, but to simply have the parser record enough information so that quote status can be determined and patched up during expansion. It's just that in the case of single quotes, expansion was never modified to recognise them. Thinking some more, I don't think the parser actually records enough information to let that work. If we fix this in the parser then everything should just work. Right, that's the approach FreeBSD sh has taken that I referred to in my message from Feb 18, that I'd personally prefer as well. It basically involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a variable expansion starts, and finding a sensible way to change the syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an explicit stack of syntaxes is created for this, but that might be avoidable: with slight modifications to what gets stored in the byte after CTLVAR/CTLARI, it might be possible to go back through the parser output to determine the syntax to revert to. I'll see if I can get that working. Cheers, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 01/03/2018 00:04, Harald van Dijk wrote: $ bash -c 'x=yz; echo "${x#'"'y'"'}"' z $ dash -c 'x=yz; echo "${x#'"'y'"'}"' yz (That is, they are executing x=yz; echo "${x#'y'}".) POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the pattern is considered unquoted regardless of the outer quotation marks. Because of that, the single quote characters should not be taken literally, but should be taken as quoting the y. ksh, posh and zsh agree with bash. Unfortunately, this causes another problem with all of the backslash approaches so far: x=''; printf "%s\n" "${x#''}" This should print a blank line. (bash, ksh, posh and zsh agree.) Here, dash's parser stores '$\$\', where $ is a control character. preglob would need to turn this into . The problem is again that preglob cannot increase the string length. Perhaps the parser needs to store this as '$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it requires some more invasive changes. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 2/28/18 11:14 PM, Denys Vlasenko wrote: Guys, while you work on fixing this, consider taking a look at some more parsing cases in this bbox bug: https://bugs.busybox.net/show_bug.cgi?id=10821 I suppose some of them may fail in dash too (I did not test, sorry). $'...' isn't supported in dash, but the underlying problem does appear to exist in dash too: $ bash -c 'x=yz; echo "${x#'"'y'"'}"' z $ dash -c 'x=yz; echo "${x#'"'y'"'}"' yz (That is, they are executing x=yz; echo "${x#'y'}".) POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the pattern is considered unquoted regardless of the outer quotation marks. Because of that, the single quote characters should not be taken literally, but should be taken as quoting the y. ksh, posh and zsh agree with bash. But: POSIX does not say the same for +/-/..., so dash is correct there: $ bash -c 'x=yz; echo "${x+'"'y'"'}"' 'y' $ dash -c 'x=yz; echo "${x+'"'y'"'}"' 'y' Because of that, for that report's follow-up comment about (unset x; echo "${x-$'\x41'}") I would not have expected this to be taken as a $'...' string. ksh (which does support $'...' strings too) prints the literal text $'\x41', and so does bash if invoked as sh. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 26/02/2018 09:40, Harald van Dijk wrote: On 26/02/2018 08:03, Harald van Dijk wrote: On 2/13/18 2:53 PM, Denys Vlasenko wrote: $ >'\' $ >'\' $ dash -c 'echo "\*"' \ \ There's another case where this goes wrong, that isn't fixed by your, my, or Herbert's patches: And hopefully I've got a good example now: $ touch '\www' $ src/dash -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z' \*a a \www In both $z and ${x#$z} / ${y#$z}, $z is unquoted. In $z, it's interpreted as a literal backslash followed by a * wildcard. In ${x#$z}, it's interpreted as an escaped * character. I don't understand why these cases should be treated differently. ksh agrees with dash. bash also behaves strangely here. In both cases, it treats the * as escaped by the preceding backslash, but in $z, the backslash is preserved, whereas in ${x#$z} / ${y#$z}, the backslash is removed: $ bash -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z' \*a a \* posh is more consistent: in both $z and in ${x#$z} / ${y#$z}, $z is treated as a literal backslash followed by a * wildcard: $ posh -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z' *a *a \www I'm having trouble telling from the spec what the expected behaviour is here, and because of that, whether dash's behaviour is already correct or needs to be modified. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 26/02/2018 08:03, Harald van Dijk wrote: On 2/13/18 2:53 PM, Denys Vlasenko wrote: $ >'\' $ >'\' $ dash -c 'echo "\*"' \ \ There's another case where this goes wrong, that isn't fixed by your, my, or Herbert's patches: $ dash -c 'a=\\a; echo "${a#\*}"' $ bash -c 'a=\\a; echo "${a#\*}"' \a My patch and Herbert's preserve dash's current behaviour, your patch makes it print a. None of that is correct, the result should be the same as bash. Never mind, this is a very bad test. I forgot about echo's backslash handling. Had my terminal emulator given any feedback on \a, I might have noticed that before sending. This test doesn't show anything wrong in this area, but I'll check a bit more. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 2/13/18 2:53 PM, Denys Vlasenko wrote: $ >'\' $ >'\' $ dash -c 'echo "\*"' \ \ There's another case where this goes wrong, that isn't fixed by your, my, or Herbert's patches: $ dash -c 'a=\\a; echo "${a#\*}"' $ bash -c 'a=\\a; echo "${a#\*}"' \a My patch and Herbert's preserve dash's current behaviour, your patch makes it print a. None of that is correct, the result should be the same as bash. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 2/24/18 5:52 PM, Herbert Xu wrote: On Sat, Feb 24, 2018 at 10:47:07AM +0100, Harald van Dijk wrote: It seems like the new control character doesn't get escaped in one place where it should be, and gets lost because of that: Unpatched: $ dash -c 'x=`printf \\\211X`; echo $x | cat -v' M-^IX Patched: $ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v' X Hmm, it works here. Can you check that your Makefile is up-to-date It was. and the generated syntax.c looks like this: const char basesyntax[] = { CEOF,CSPCL, CWORD, CCTL, CCTL,CCTL,CCTL,CCTL, CCTL,CCTL,CCTL,CCTL, Thanks, that was what was wrong. Although mksyntax did get re-executed because of your Makefile change, re-executing it didn't help: mksyntax doesn't use parser.h at run time, only at build time. So I think the Makefile.am change should instead be: --- a/src/Makefile.am +++ b/src/Makefile.am @@ -66,7 +66,7 @@ syntax.c syntax.h: mksyntax signames.c: mksignames ./$^ -mksyntax: token.h +mksyntax: parser.h token.h $(HELPERS): %: %.c $(COMPILE_FOR_BUILD) -o $@ $< Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 2/24/18 1:33 AM, Herbert Xu wrote: On Wed, Feb 21, 2018 at 11:47:58PM +0100, Harald van Dijk wrote: On 2/21/18 2:50 PM, Denys Vlasenko wrote: I propose replacing this: Agreed, that looks better. Thanks! Good work guys. However, could you check to see if this patch works too? It passes all my tests so far. It seems like the new control character doesn't get escaped in one place where it should be, and gets lost because of that: Unpatched: $ dash -c 'x=`printf \\\211X`; echo $x | cat -v' M-^IX Patched: $ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v' X Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 2/21/18 2:50 PM, Denys Vlasenko wrote: I propose replacing this: Agreed, that looks better. Thanks! Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 2/18/18 11:50 PM, Harald van Dijk wrote: On 2/14/18 11:50 PM, Harald van Dijk wrote: On 2/14/18 10:44 PM, Harald van Dijk wrote: On 2/14/18 9:03 PM, Harald van Dijk wrote: On 13/02/2018 14:53, Denys Vlasenko wrote: $ >'\' $ >'\' $ dash -c 'echo "\*"' \ \ [...] Currently: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' <> This is what I expect, and also what bash, ksh and posh do. With your patch: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' Does the attached look right as an alternative? It treats a quoted backslash the same way as if it were preceded by CTLESC in _rmescapes. It passes your test case and mine, but I'll do more extensive testing. It causes preglob's string to potentially grow larger than the original. When called with RMESCAPE_ALLOC, that can be handled by increasing the buffer size, but preglob also gets called without RMESCAPE_ALLOC to modify a string in-place. That's never going to work with this approach. Back to the drawing board... There is a way to make it work: ensure sufficient memory is always available. Instead of inserting CTLESC, which caused problems, CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a no-op here. It required one obvious additional trivial change to the CHECKSTRSPACE invocation, but with that added, the attached passed all testing I could think of. Does this look okay to include, did I miss something, or is there perhaps a better alternative? Cheers, Harald van Dijk diff --git a/src/expand.c b/src/expand.c index 2a50830..af88a69 100644 --- a/src/expand.c +++ b/src/expand.c @@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag) } if (*p == (char)CTLESC) { p++; - if (notescaped) -*q++ = '\\'; - } else if (*p == '\\' && !inquotes) { - /* naked back slash */ - notescaped = 0; - goto copy; + goto escape; + } else if (*p == '\\') { + if (inquotes) { +escape: +if (notescaped) + *q++ = '\\'; + } else { +/* naked back slash */ +notescaped = 0; +goto copy; + } } notescaped = globbing; copy: diff --git a/src/parser.c b/src/parser.c index 382658e..a847b2e 100644 --- a/src/parser.c +++ b/src/parser.c @@ -909,7 +909,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) #endif CHECKEND(); /* set c to PEOF if at end of here document */ for (;;) { /* until end of line or end of word */ - CHECKSTRSPACE(4, out); /* permit 4 calls to USTPUTC */ + CHECKSTRSPACE(5, out); /* permit 5 calls to USTPUTC */ switch(syntax[c]) { case CNL: /* '\n' */ if (syntax == BASESYNTAX) @@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) eofmark != NULL ) ) { + /* Reserve extra memory in case this backslash will require later escaping. */ + USTPUTC(CTLQUOTEMARK, out); + USTPUTC(CTLQUOTEMARK, out); USTPUTC('\\', out); } USTPUTC(CTLESC, out);
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 2/14/18 11:50 PM, Harald van Dijk wrote: On 2/14/18 10:44 PM, Harald van Dijk wrote: On 2/14/18 9:03 PM, Harald van Dijk wrote: On 13/02/2018 14:53, Denys Vlasenko wrote: $ >'\' $ >'\' $ dash -c 'echo "\*"' \ \ [...] Currently: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' <> This is what I expect, and also what bash, ksh and posh do. With your patch: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' Does the attached look right as an alternative? It treats a quoted backslash the same way as if it were preceded by CTLESC in _rmescapes. It passes your test case and mine, but I'll do more extensive testing. It causes preglob's string to potentially grow larger than the original. When called with RMESCAPE_ALLOC, that can be handled by increasing the buffer size, but preglob also gets called without RMESCAPE_ALLOC to modify a string in-place. That's never going to work with this approach. Back to the drawing board... There is a way to make it work: ensure sufficient memory is always available. Instead of inserting CTLESC, which caused problems, CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a no-op here. I'm currently testing the attached. To be honest, FreeBSD sh's approach, keeping a syntax stack to detect characters' meaning reliably at parse time, feels more elegant to me right now, but that requires invasive and therefore risky changes to dash's code. Cheers, Harald van Dijk diff --git a/src/expand.c b/src/expand.c index 2a50830..af88a69 100644 --- a/src/expand.c +++ b/src/expand.c @@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag) } if (*p == (char)CTLESC) { p++; - if (notescaped) -*q++ = '\\'; - } else if (*p == '\\' && !inquotes) { - /* naked back slash */ - notescaped = 0; - goto copy; + goto escape; + } else if (*p == '\\') { + if (inquotes) { +escape: +if (notescaped) + *q++ = '\\'; + } else { +/* naked back slash */ +notescaped = 0; +goto copy; + } } notescaped = globbing; copy: diff --git a/src/parser.c b/src/parser.c index 382658e..bb16a46 100644 --- a/src/parser.c +++ b/src/parser.c @@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) eofmark != NULL ) ) { + /* Reserve extra memory in case this backslash will require later escaping. */ + USTPUTC(CTLQUOTEMARK, out); + USTPUTC(CTLQUOTEMARK, out); USTPUTC('\\', out); } USTPUTC(CTLESC, out);