Re: getopts appears to not be shifting $@ when consuming options

2021-01-30 Thread Harald van Dijk

On 29/01/2021 20:15, Harald van Dijk wrote:
I would suggest that if this is changed to conform to POSIX, a 
non-standard method should remain available to allow shell functions to 
use getopts internally, including when the getopts loop in the function 
calls other functions that themselves use getopts, to ensure that any 
existing scripts broken by the change can be easily updated. One way to 
achieve that could be to special-case "local OPTIND=1" so that when the 
function returns, it restores not just the value of OPTIND, but uses 
that moment to additionally restore the extra internal state.


I have found that "local OPTIND=1" already works that way in bash, 
restoring upon function return not only the value of OPTIND but also the 
extra internal extra state. It is designed to work like that, not just a 
happy accident: it is listed as a new feature of 4.4. Because of that, 
if something like this is implemented, this would probably be the best 
syntax to use.


I have also found that this is easy to implement: currently, 
getoptsreset() just sets the internal state (shellparam.optind and 
shellparam.optoff) based on the text value of OPTIND. Instead, if:

- var.h defines a new flag to indicate that special hidden data is
  present past the end of OPTIND's text value,
- getopts() builds its own buffer, including the extra hidden data, and
  calls setvareq() directly with that special flag rather than relying
  on setvarint(),
- getoptsreset() checks it to either set the internal state as before
  or restore the internal state, depending on whether that special flag
  is set,
everything just works with no special casing needed in the 
implementation of the "local" command. The net result is even a small 
reduction in code size.


I am only posting a description of the changes rather than a patch or a 
link to a patch because I implemented this in my shell (a fork of dash), 
not dash itself, and the changes as I implemented them cannot be applied 
to dash without some modifications.


Cheers,
Harald van Dijk


Re: getopts appears to not be shifting $@ when consuming options

2021-01-29 Thread Harald van Dijk

On 29/01/2021 20:36, Jilles Tjoelker wrote:

Unfortunately, dash and FreeBSD sh reset the getopts state when the
positional parameters are modified via set or shift. They probably do
this to avoid use after free and out of bounds memory access when a
script violates POSIX's rule.


In dash, the getopts command guards against this. getoptscmd() checks 
that shellparam.optind is in bounds before calling getopts(), and 
getopts() checks that shellparam.optoff is in bounds, so as far as 
correctness goes, it should be enough to just remove the resetting of 
shellparam.optind and shellparam.optoff in those places that are not 
supposed to reset the state.


Cheers,
Harald van Dijk


Re: getopts appears to not be shifting $@ when consuming options

2021-01-29 Thread Harald van Dijk

Hi,

On 29/01/2021 18:25, earnestly wrote:

In this example dash will repeatedly append 'attr=foo' to the list of
parameters in an infinite loop:

 #!/bin/dash -x

 while getopts :a: arg -a foo -a bar; do
 case $arg in
 a) set -- "$@" attr="$OPTARG"
 esac
 done
 shift "$((OPTIND - 1))"

Instead I expected this to result in parameter list containing
'attr=foo' and 'attr=bar'.


You are correct in your expectation, I believe: ending the loop after 
processing both arguments is what your script should do.


The reason that dash is behaving the way it does is that dash resets the 
getopts state when the positional arguments are changed by the set or 
shift commands. The getopts state is also reset when the positional 
arguments are changed because of a function call, but restored after the 
function returns. This is something shared across ash-based shells; you 
will also find it in FreeBSD's /bin/sh, and if I am not misreading the 
code, NetBSD's /bin/sh as well.


Although there are certainly cases where this behaviour is useful, 
especially the part where it saves and restores state for function 
calls, there are also where it is not, such as yours. Additionally, it 
appears to be in conflict with POSIX, which requires the getopts state 
to be preserved as long as it continues to be called with the same 
arguments: only resetting OPTIND to 1 is specified to reset the state to 
allow arguments to be parsed anew.


I would suggest that if this is changed to conform to POSIX, a 
non-standard method should remain available to allow shell functions to 
use getopts internally, including when the getopts loop in the function 
calls other functions that themselves use getopts, to ensure that any 
existing scripts broken by the change can be easily updated. One way to 
achieve that could be to special-case "local OPTIND=1" so that when the 
function returns, it restores not just the value of OPTIND, but uses 
that moment to additionally restore the extra internal state.


Cheers,
Harald van Dijk


Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think

2021-01-10 Thread Harald van Dijk

On 23/12/2020 20:18, Harald van Dijk wrote:

On 21/12/2020 16:24, Jilles Tjoelker wrote:

Tradition is for job control shells to be a process group leader, but I
don't really know why. Changing this will not fix the issue entirely
anyway since the shell must perform tcsetpgrp() from the background when
a foreground job has terminated.

[...]
This should, in my opinion, *only* happen for interactive shells, not 
for job-control-enabled non-interactive shells. [...]

Consider also an extra difficulty arising from this process group switching:

  : | dash -m

When a job-control-enabled shell terminates, or when job control is 
disabled via set +m, it attempts to re-join its initial process group 
and set that as the foreground process group. This can fail if the 
process group has ceased to exist after the shell left it, as it does 
here, resulting in:


  $ : | dash -m
  dash: 0: Cannot set tty process group (No such process)

In theory, because of PID reuse, this may even result in some random 
unrelated process group temporarily becoming the foreground process group.


I am leaning towards saying that once the shell has committed to 
becoming a process group leader, it should remain one. By basing this on 
the shell being interactive rather than on job control being enabled, 
this is easier to handle, as as far as POSIX is concerned "interactive" 
is a property that cannot be changed once the shell has started: set -i 
and set +i are extensions not required by POSIX, and if they are 
nonetheless supported it is easy to defend them not being fully 
equivalent to specifying or leaving out -i on the shell invocation.


Cheers,
Harald van Dijk


Re: [v3 PATCH 17/17] eval: Add vfork support

2021-01-08 Thread Harald van Dijk

On 18/05/2018 19:39, Herbert Xu wrote:

This patch adds basic vfork support for the case of a simple command.
...  
@@ -879,17 +892,30 @@ forkchild(struct job *jp, union node *n, int mode)

}
}
if (!oldlvl && iflag) {
-   setsignal(SIGINT);
-   setsignal(SIGQUIT);
+   if (mode != FORK_BG) {
+   setsignal(SIGINT);
+   setsignal(SIGQUIT);
+   }
setsignal(SIGTERM);
}
+
+   if (lvforked)
+   return;
+
for (jp = curjob; jp; jp = jp->prev_job)
freejob(jp);
  }


This leaves SIGQUIT ignored in background jobs in interactive shells.

  ENV= dash -ic 'dash -c "kill -QUIT \$\$; echo huh" & wait'

As of dash 0.5.11, this prints "huh". Before, the subprocess process 
killed itself before it could print anything. Other shells do not leave 
SIGQUIT ignored.


(In a few other shells, this also prints "huh", but in those other 
shells, that is because the inner shell chooses to ignore SIGQUIT, not 
because the outer shell leaves it ignored.)


Cheers,
Harald van Dijk


Re: [PATCH] jobs: Block signals during tcsetpgrp

2021-01-06 Thread Harald van Dijk

On 06/01/2021 04:45, Herbert Xu wrote:

This patch implements the blocking of SIGTTOU (and everything else)
while we call tcsetpgrp.

Reported-by: Steffen Nurpmeso 
Signed-off-by: Herbert Xu 

diff --git a/src/jobs.c b/src/jobs.c
index 516786f..809f37c 100644
--- a/src/jobs.c
+++ b/src/jobs.c
@@ -1512,7 +1512,13 @@ showpipe(struct job *jp, struct output *out)
  STATIC void
  xtcsetpgrp(int fd, pid_t pgrp)
  {
-   if (tcsetpgrp(fd, pgrp))
+   int err;
+
+   sigblockall(NULL);
+   err = tcsetpgrp(fd, pgrp);
+   sigclearmask();
+
+   if (err)
sh_error("Cannot set tty process group (%s)", strerror(errno));
  }
  #endif


While this is a step in the right direction, Jilles has already replied 
with an explanation of why this is not enough: if the terminal is in 
TOSTOP mode, it's not just tcsetpgrp() that needs to be handled, it's 
any write as well that may occur while the shell is not in the 
foreground process group. While it may be working according to design 
for messages written when the shell is not supposed to be in the 
foreground process group, it is another story when the shell is both 
responsible for taking itself out of the foreground process group and 
for writing a message. This is made worse by the fact that there is no 
synchronisation with child processes on errors, so even forcibly 
restoring the foreground process group may not be enough: unfortunate 
scheduling may result in a child process immediately setting the 
foreground process group to the child process after the parent process 
attempted to restore it to itself. I have not yet seen a good solution 
for this.


Cheers,
Harald van Dijk


Re: set -I is not required by standard, and does not match bash

2021-01-04 Thread Harald van Dijk

On 04/01/2021 12:22, Denys Vlasenko wrote:

Hello,

In dash, set -I is a short-option alias to set -o ignoreeof.

However, bash does not have such alias, it has an undocumented
set -I which switches off "invisible variables"
(I don't know what that is).


bash does not have any set -I. bash *had* a set -I, but dropped it in 
version 5.1, so there is no longer a compatibility issue to worry about.


Cheers,
Harald van Dijk


Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think

2020-12-23 Thread Harald van Dijk

On 21/12/2020 16:24, Jilles Tjoelker wrote:

Also, simply entering the command
 trap "echo TTOU" TTOU
in an interactive shell makes it behave strangely. Created jobs
immediately stop, "fg" works once but after that the shell tends to get
stuck quickly.


Good catch, there is some work to be done there too.


This seems a good approach, but certain writes to the terminal may need
the same treatment, for example the error message when fork fails for
the second and further elements of a pipeline. This also makes me wonder
why SIGTTOU is ignored at all by default.


True. This is hard to create a reliable test case for, but there is only 
limited code that can get executed while a job-control-enabled shell may 
not be in the foreground process group.


If fork fails halfway through a pipeline, it may also be a problem that 
some of the commands in the pipeline may still run.



In mksh, the issue is resolved slightly differently: setting a trap on
TTOU pretends to work but the signal disposition remains set to ignored.


zsh also rejects traps on TTOU, but does so explicitly:

  zsh: can't trap SIGTTOU in interactive shells

Amusingly, it prints this in any shell where job control is enabled, 
regardless of whether the shell is interactive.



Tradition is for job control shells to be a process group leader, but I
don't really know why. Changing this will not fix the issue entirely
anyway since the shell must perform tcsetpgrp() from the background when
a foreground job has terminated.


I've been thinking more about this, and I suspect it's a another 
conflation between interactive mode and job control. In an interactive 
shell that's not executing any external program, you want any Ctrl-C to 
only send SIGINT to that shell, not to any other process. In order for 
that to work, it needs to be its own process group.


This should, in my opinion, *only* happen for interactive shells, not 
for job-control-enabled non-interactive shells. Consider


  sh -c 'sh -mc "while :; do :; done"; echo bye'

The behaviour I would want is that Ctrl-C kills the parent shell, so 
that this does not print "bye". Different shells behave differently.



What is definitely required, though, is that the shell not read input or
alter terminal settings if it is started in the background (possibly
unless the script explicitly ignored SIGTTOU). This is what the loop
with tcgetpgrp() does.


Definitely.

Cheers,
Harald van Dijk


Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think

2020-12-19 Thread Harald van Dijk

On 19/12/2020 22:21, Steffen Nurpmeso wrote:

Steffen Nurpmeso wrote in
  <20201219172838.1b-wb%stef...@sdaoden.eu>:
  |Long story short, after falsely accusing BSD make of not working

After dinner i shortened it a bit more, and attach it again, ok?
It is terrible, but now less redundant than before.
Sorry for being so terse, that problem crosses my head for about
a week, and i was totally mislead and if you bang your head
against the wall so many hours bugs or misbehaviours in a handful
of other programs is not the expected outcome.


I think a minimal test case is simply

all:
$(SHELL) -c 'trap "echo TTOU" TTOU; set -m; echo all good'

unless I accidentally oversimplified.

The SIGTTOU is caused by setjobctl's xtcsetpgrp(fd, pgrp) call to make 
its newly started process group the foreground process group when job 
control is enabled, where xtcsetpgrp is a wrapper for tcsetpgrp. (That's 
in dash, the other variants may have some small differences.) tcsetpgrp 
has this little bit in its specification:


   Attempts to use tcsetpgrp() from a process which is a member of
   a background process group on a fildes associated with its con‐
   trolling  terminal  shall  cause the process group to be sent a
   SIGTTOU signal. If the calling thread is blocking SIGTTOU  sig‐
   nals  or  the  process is ignoring SIGTTOU signals, the process
   shall be allowed to perform the operation,  and  no  signal  is
   sent.

Ordinarily, when job control is enabled, SIGTTOU is ignored. However, 
when a trap action is specified for SIGTTOU, the signal is not ignored, 
and there is no blocking in place either, so the tcsetpgrp() call is not 
allowed.


The lowest impact change to make here, the one that otherwise preserves 
the existing shell behaviour, is to block signals before calling 
tcsetpgrp and unblocking them afterwards. This ensures SIGTTOU does not 
get raised here, but also ensures that if SIGTTOU is sent to the shell 
for another reason, there is no window where it gets silently ignored.


Another way to fix this is by not trying to make the shell start a new 
process group, or at least not make it the foreground process group. 
Most other shells appear to not try to do this.


Cheers,
Harald van Dijk


Re: Changes to job handling cause hangs in wait

2020-12-01 Thread Harald van Dijk

On 01/12/2020 10:56, Herbert Xu wrote:

On Tue, Dec 01, 2020 at 10:55:06AM +, Harald van Dijk wrote:


You wrote: "So the problem is really in the parent of this shell, which
appears to be bash:"


You should read my follow-up email too that suggested changing
the systemd script.


I do not appreciate your false accusation that I did not read your 
follow-up email. You suggested that as an alternative, without 
retracting your claim that this is a problem in bash.


Cheers,
Harald van Dijk


Re: Changes to job handling cause hangs in wait

2020-12-01 Thread Harald van Dijk

On 01/12/2020 10:53, Herbert Xu wrote:

On Tue, Dec 01, 2020 at 10:50:19AM +, Harald van Dijk wrote:


This used to exit immediately, leaving the sleep process running in the
background without waiting for it. On the dash that's currently provided by
Ubuntu, based on 0.5.10.2, it still behaves that way. With current dash from
Git, it does not. This is clearly a change in behaviour in dash not in
response to any bug (real or not) in bash.


I'm not suggesting it's a bug in bash.


You wrote: "So the problem is really in the parent of this shell, which 
appears to be bash:"


Cheers,
Harald van Dijk


Re: Changes to job handling cause hangs in wait

2020-12-01 Thread Harald van Dijk

On 01/12/2020 10:34, Herbert Xu wrote:

On Tue, Dec 01, 2020 at 10:14:04AM +, Harald van Dijk wrote:


POSIX says:

   "If  the  wait utility is invoked with no operands, it shall wait until
all process IDs known to the invoking shell have terminated and exit with a
zero exit status."

I would say that child processes that were created before dash was started
do not have process IDs known to dash.


Well I disagree


Then dash still has a bug regardless:

  bash -c 'dash -c "wait" <(sleep 1d)'

Here, dash does not wait for the child process.

The sleep process is either known to dash, in which case dash must wait 
in both examples, or not known to dash, in which case dash must wait in 
neither example.



and the fact that the original ksh does the same
thing is an important fact.


The original ksh is well-known not to conform to the current POSIX 
requirements, in some cases because POSIX changed requirements, in some 
cases because ksh's behaviour was clearly buggy and not worth standardising.



bash -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true;  . ~/.profile >/dev/null 2>&1 || true; 
buildtree="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree"; mkdir -p -m 1777 -- "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export 
AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 
"/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export 
DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=2; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE   LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS   
LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT 
PIPE; cd "$buildtree"; export AUTOPKGTEST_NORMAL_USER=; export ADT_NORMAL_USER=; chmod +x /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated; touch 
/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr; /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated 2> >(tee -a 
/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr >&2) > >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout);

For some reason this is causing the final two tee's to be created
as children of debian/tests/timedated rather than the bash shell.


This is because of the same optimisation that dash also has, where it tries
to avoid creating a subshell for the last command in a list when it can just
exec() without a fork() instead. A minimal example without an explicit exec
is

   bash -c 'dash -c ": & wait" <(sleep 1d)'


I'm not sure about that because bash itself is still hanging around,
if it were really the -c optimisation then bash should not appear in
the ps output at all.


Good point, it's a bit more complicated. Can take a more in-depth look 
at that example later.


Here's a simpler example without bash at all, then, where it is far 
clearer that the -c optimisation is responsible:


  dash -c 'sleep 1d & dash -c ": & wait"'

This used to exit immediately, leaving the sleep process running in the 
background without waiting for it. On the dash that's currently provided 
by Ubuntu, based on 0.5.10.2, it still behaves that way. With current 
dash from Git, it does not. This is clearly a change in behaviour in 
dash not in response to any bug (real or not) in bash.


Cheers,
Harald van Dijk


Re: Changes to job handling cause hangs in wait

2020-12-01 Thread Harald van Dijk

On 01/12/2020 06:06, Herbert Xu wrote:

On Tue, Dec 01, 2020 at 04:42:03PM +1100, Herbert Xu wrote:


Nevermind, I see that the script has been modified to use bash.

I can reproduce the problem now so it's all good.


OK the problem is this:

sh -c 'sleep 1d& exec $MYSHELL -c "sleep 1& wait"'

You can replace MYSHELL with whatever shell you want to use.

Essentially dash will now wait for all children, even ones that
were created prior to its existence, however, bash only waits for
children that it created directly.

FWIW ksh exhibits the same behaviour as dash and I think there
is nothing wrong with this.


POSIX says:

  "If  the  wait utility is invoked with no operands, it shall wait 
until all process IDs known to the invoking shell have terminated and 
exit with a zero exit status."


I would say that child processes that were created before dash was 
started do not have process IDs known to dash.



So the problem is really in the parent of this shell, which appears
to be bash:

bash -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true;  . ~/.profile >/dev/null 2>&1 || true; 
buildtree="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree"; mkdir -p -m 1777 -- "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export 
AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 
"/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export 
DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=2; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE   LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS   
LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT 
PIPE; cd "$buildtree"; export AUTOPKGTEST_NORMAL_USER=; export ADT_NORMAL_USER=; chmod +x /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated; touch 
/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr; /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated 2> >(tee -a 
/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr >&2) > >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout);

For some reason this is causing the final two tee's to be created
as children of debian/tests/timedated rather than the bash shell.


This is because of the same optimisation that dash also has, where it 
tries to avoid creating a subshell for the last command in a list when 
it can just exec() without a fork() instead. A minimal example without 
an explicit exec is


  bash -c 'dash -c ": & wait" <(sleep 1d)'

Cheers,
Harald van Dijk


Re: [bug] dash eats one line

2020-11-29 Thread Harald van Dijk

Hi,

On 27/11/2020 12:16, Pedro wrote:

Hi,

I am using dash on debian 10 buster (stable release) which according to 
`dpkg -S dash` is `Version: 0.5.10.2-5`


evilham encountered a bug where dash eats one line but not in freebsd's 
shell nor bash


attached you can find an example to reproduce it

execution of the example in my computer (enableUFWRules is not in new 
line):


  ./dash-bug.sh
# Enable UFW rules only if requested (it was)enableUFWRules

  $ bash dash-bug.sh
# Enable UFW rules only if requested (it was)
enableUFWRules


This looks like 
<https://www.mail-archive.com/dash@vger.kernel.org/msg01883.html>, which 
has been fixed in dash 0.5.11. I can confirm that your script produces 
the expected output with dash 0.5.11 as well as dash built from current Git.


Cheers,
Harald van Dijk


Re: [PATCH] Don't include config.h when building helpers using the native compiler

2020-06-22 Thread Harald van Dijk

On 22/06/2020 06:25, Fabrice Fontaine wrote:

config.h contains settings for the cross compiler (most importantly
32/64bit versions of functions), so don't include it when calling the
native compiler to build the helpers.

Otherwise we get build errors like:

/usr/bin/gcc -include ../config.h -DBSD=1 -DSHELL -DIFS_BROKEN  -g -O2 -Wall
-o mkinit mkinit.c
In file included from /usr/include/sys/stat.h:107,
  from /usr/include/fcntl.h:38,
  from mkinit.c:50:
/usr/include/bits/stat.h:117: error: redefinition of ‘struct stat’
In file included from /usr/include/fcntl.h:38,
  from mkinit.c:50:
/usr/include/sys/stat.h:504: error: redefinition of ‘stat’
/usr/include/sys/stat.h:455: note: previous definition of ‘stat’ was here

Signed-off-by: Peter Korsgaard 
[Retrieved from:
https://git.buildroot.net/buildroot/tree/package/dash/0001-no-config.h-for-helpers.patch]
Signed-off-by: Fabrice Fontaine 


A better version of this patch has already been submitted and accepted 
in 2018: <https://www.spinics.net/lists/dash/msg01629.html>.


Cheers,
Harald van Dijk


Re: Improving xtrace output from subshells

2020-05-28 Thread Harald van Dijk

On 29/04/2019 23:55, Harald van Dijk wrote:

On 29/04/2019 15:44, Vadim Zeitlin wrote:
On Sat, 27 Apr 2019 22:58:23 +0100 Harald van Dijk 
 wrote:

HvD> It wastes memory: the code should be re-worked so that output and
HvD> preverrout share the same buffer, given that there can never be 
pending

HvD> output for one when the other is being written to.

  I'm a bit worried about this one, it doesn't seem completely obvious 
to me

that the 2 objects can't be used at the same time.


It's indeed not completely obvious, but all bits that print to output 
will flush the buffer when they're done, before a next command starts 
executing, before the next command requires an xtrace string to be 
printed. It's usually the flushall() in evalbltin() that handles this.


I have managed to verify that using a single buffer for all streams 
(output, errout and preverrout) works well and simplifies logic so that 
the shell with that change is actually smaller than the one without it. 
I do not know whether there would be interest in getting that in dash.


Cheers,
Harald van Dijk


Re: Bug in Dash's unquoting of backslashes within backquoted strings

2020-05-21 Thread Harald van Dijk

On 21/05/2020 22:38, Ron Yorston wrote:

Matt Whitlock wrote:

A minimal example:

: `: "
\\$(bug)"`



However, when it appears inside a backquoted subcommand (with the
backslash characters being appropriately escaped), such as given at the top
of this report, then Dash processes it incorrectly:

/bin/sh: 1: bug: not found


This seems to have been introduced by commit 6bbc71d (parser: use
pgetc_eatbnl() in more places).  Reverting the following part of the
commit makes the problem go away:


Yes, that commit (a patch I had submitted) was buggy and the part that 
you suggest reverting should be.


Cheers,
Harald van Dijk


Re: [v2 PATCH] parser: Catch errors in expandstr

2020-05-17 Thread Harald van Dijk

On 28/04/2020 07:17, Herbert Xu wrote:

---8<---
On Fri, Dec 13, 2019 at 02:51:34PM +, Simon Ser wrote:

Just noticed another dash bug: when setting invalid PS1 values dash
enters an infinite loop.

For instance, setting PS1='$(' makes dash print many of these:

dash: 1: Syntax error: end of file unexpected (expecting ")")

It would be nice to fallback to the default PS1 value on error.


This patch fixes it by using the literal value of PS1 should an
error occur during expansion.

On Wed, Feb 26, 2020 at 09:12:04PM +, Ron Yorston wrote:


There's another case that should be handled.  PS1='`xxx(`' causes the
shell to exit because the old-style backquote leaves an additional file
on the stack.


Ron's change has been folded into this patch.


This still does not restore the state completely. It does not clean up 
any pending heredocs. I see:


  $ PS1='$(<

That is, after entering the ':' command, the shell is still trying to 
read the heredoc from the prompt.


I have not looked in detail to see if anything else is not getting 
cleaned up that should be.


Cheers,
Harald van Dijk


Re: [PATCH] build: delete AC_PROG_YACC

2019-10-12 Thread Harald van Dijk

On 10/10/2019 17:42, Fangrui Song wrote:

src/arith.y was deleted by commit f6e3b2 [ARITH] Add assignment and intmax_t 
support.
We can delete AC_PROG_YACC to get rid of the build-time dependency.


Just a minor point of clarification: there is no build-time dependency 
on bison/yacc. The configure script checks whether bison/yacc exists, 
but if not, happily continues. This change does not get rid of a 
build-time dependency, it cleans up the configure script by removing an 
unused check.


Which is obviously still an improvement so your patch should still go in.

Cheers,
Harald van Dijk


Re: Improving xtrace output from subshells

2019-04-27 Thread Harald van Dijk

On 27/04/2019 15:05, Vadim Zeitlin wrote:

On Sat, 27 Apr 2019 01:14:29 +0100 Harald van Dijk  wrote:

HvD> I don't believe there is any requirement for your expectation and
HvD> several other shells share this behaviour, but agreed that it would be a
HvD> nice improvement to avoid this.

  Yes, this is certainly just an expectation and not something guaranteed by
POSIX, but it looks like a very reasonable expectation to me. And, speaking
of the other shells, all of the ones I tested (bash, ksh, zsh) do keep the
lines together.


bosh and yash behave the same as dash.


HvD> Please keep in mind that even without this, you would not have
HvD> deterministic output. The commands in a pipeline could be output in any
HvD> order, even if each command would be output on its own line.

  In theory, it seems like you're right and it should be possible, but in
practice it just doesn't seem to ever happen, i.e. the output is always in
the same left-to-right order. Maybe there is something that implicitly
serializing the output from the pipeline components either in the shell or
in the kernel? I admit I haven't dived into this deep enough to really find
out, the fact that the order remains always the same in practice is good
enough for me.


It seems to mostly work that way in bash, and I have no dug into the 
reason for that either. You tested ksh and zsh as well. In ksh and zsh, 
I do commonly see variations in the order in which the lines are 
printed, and with patched dash, I see the same variations.


Rarely, I see out-of-order prints even in bash, so it does seem like 
it's just a coincidence as a result of the way of bash is written, not 
something that bash actively attempts to guarantee. Try it:


  c=0
  while :
  do
xtrace=$(bash -o posix -xc ': a | : b' 2>&1)
case $xtrace in
*a*b*) : $((c+=1)) ;;
*) echo $c
   echo "$xtrace"
   break ;;
esac
  done

It takes hundreds or sometimes even thousands of runs, but eventually, I 
do see + : b followed by + : a.



HvD> As a proof of concept that should clearly not be committed in its
HvD> current form, please see the attached patch.

  Thanks, this is indeed much simpler than what I had in mind and I can
confirm that it does work.

  What prevents this patch from being applied in the current form exactly?


I removed a #ifdef FLUSHERR which suggests to me this is supposed to be 
optional, not unconditional.


The macro OUTBUFSIZ should be renamed if it is no longer just the size 
of the "output" buffer.


I only touched the case where USE_GLIBC_STDIO is undefined. (That said, 
the case where USE_GLIBC_STDIO is defined is already broken, so this may 
be okay.)


It wastes memory: the code should be re-worked so that output and 
preverrout share the same buffer, given that there can never be pending 
output for one when the other is being written to. (This may apply to 
errout as well, if someone wants to enable buffering for that.)


Cheers,
Harald van Dijk


Re: Improving xtrace output from subshells

2019-04-26 Thread Harald van Dijk

On 25/04/2019 16:55, Vadim Zeitlin wrote:

  Hello,

  dash currently has a problem with output from "-x" option being
non-deterministic when subshells are used (e.g. whenever pipelines are).
This was reported a long time ago as a Debian bug[1] but AFAICS has never
been posted here, so at the very least I'd like to do it and, ideally, also
fix it. But first let me describe the bug in more details:

  When you run the following command:

$ dash -xc 'z=`echo | cut -f1 | sort`'

the following output would be expected:

+ echo
+ cut -f1
+ sort
+ z=

And, indeed, this is what you usually get. However the output is
non-deterministic and you can also get weirder things, with the output from
different subshells intermixed together. A simple way to see it is to run
the following loop:

$ while true; do dash -xc 'z=`echo | cut -f1 | sort`' 2>&1 | grep cut; 
done | fgrep -vx '+ cut -f1'

which is not supposed to output anything, but in practice it does quickly
give some output, e.g. here is what I see when running it with the latest
dash version from git (48875c1201930d1e71d358eb1cf3eacc166795be):

cut -f1
+ + cut -f1
+ cut+  -f1
+ cut+  -f1
+ cut+  -f1
cut -f1
+ + echocut
cut -f1
cut -f1
+ + cutsort
+ cut -f1+


I don't believe there is any requirement for your expectation and 
several other shells share this behaviour, but agreed that it would be a 
nice improvement to avoid this.



In practice, this is very annoying as it makes it difficult to compare the
logs of the same script run using "-x" as for any non-trivial amount of
output there will always be spurious differences between the output
produced by the different runs.


  As the comment from Tomi Ollila in the Debian bug report says, the problem
is due to using several different write() calls for output. Looking at the
sources, this happens inside "if (xflag)" block in evalcommand() function
starting at the line 854 in eval.c, where outstr(), eprintlist() twice, and
outcslow() are used for the single line of output.


Please keep in mind that even without this, you would not have 
deterministic output. The commands in a pipeline could be output in any 
order, even if each command would be output on its own line.



  To fix the problem we need to use only a single call instead and I see 2
possibilities here: first the obvious one which would be to output
everything in a temporary memory buffer and output the entire buffer at
once to preverrout at the end. This approach has some problems, however:

1. It would consume some extra memory and while this should be negligible
in practice, it's arguably still better to avoid it.

2. Support for memory output is disabled using "#ifdef notyet" in the
current sources and enabling it would enable quite a few other things,
so I'm not sure if it's really a good idea.

3. Without enabling memory output, the code would need to be changed
relatively extensively and I'd need to either refactor mkinit.c to
extract "struct text" and the related addstr()/addchar() functions
from it and make it possible to reuse it from eval.c, or reimplement
something quite close to it, which is not appealing.


The #ifdef notyet is more than is needed for this. The output mechanism 
already supports buffering. It's easy to adapt this to work for xtrace 
output as well. You'd only run into issues if you have a command string 
that's larger than the buffer size, but I think that should be 
considered acceptable.


As a proof of concept that should clearly not be committed in its 
current form, please see the attached patch.



The other (and the only other, AFAICS) possibility would be to use writev()
to output everything at once instead. This avoids (1) (but, again, I don't
think it's really worth avoiding it that much), but would also require
making rather extensive changes to the code. I'm also not sure if writev()
is supported on all platforms targeted by dash, but hopefully it should be,
in 2019.

  I'd appreciate your advice about what kind of approach would be the most
appropriate and most likely to be accepted. Personally, I think of enabling
support for memory-backed output structs (only, i.e. without touching all
the rest of the code guarded by "#ifdef notyet") and using it here, but if
this is undesirable, for whatever reason, I'd go with writev() because as
long as we need to change the code structure anyhow, it's not really more
complicated than using a temporary buffer.

  Thanks in advance!
VZ

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567648

--- a/src/eval.c
+++ b/src/eval.c
@@ -861,9 +861,7 @@ bail:
 		sep = eprintlist(out, varlist.list, sep);
 		eprintlist(out, osp, sep);
 		outcslow('\n', out);
-#ifdef FLUSHERR
 		flushout(out);
-#endif
 	}
 
 	/* Now locate the command. */
--- a/src/output.c
+++ b/src/output.c
@@ -88,7 +88,9 @@ struct output output 

Re: POSIX compliance issues with case statements

2019-04-17 Thread Harald van Dijk

Hi,

On 17/04/2019 17:29, Drew DeVault wrote:

Unless I'm misinterpreting the specification, it seems like dash doesn't
handle pattern matching in case statements correctly. The following
sample demonstrates the issue:

#!/bin/sh -eu
while read -r line


This will strip leading and trailing whitespace. To prevent that, you 
need to set IFS to an empty string first.



do
case "$line" in
[:space:]*:)


This will match on a first character that is ':', 's', 'p', 'a', 'c', 
'e', or ':' (again), not a first character that is a space. You want 
[[:space:]]*: here.



echo "a"
;;
*:)
echo "b"
;;
*)
echo "c"
;;
esac
done <https://mirror.sr.ht/alpine/sr.ht/
EOF

The expected output is b a c c, but the output in practice is b b c c.
Quoting the spec:


http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05

...case shall execute the compound-list corresponding to the first one of
several patterns (see Pattern Matching Notation)...

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13

...patterns matching a single character shall match a single character:
ordinary characters, special pattern characters, and pattern bracket
expressions...

...an open bracket... shall introduce a pattern bracket expression... as in
XBD RE Bracket Expression...

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

...The following character class expressions shall be supported in all
locales:

[:space:]


Character class expressions cannot be used by themselves, they can only 
be part of bracket expressions, which is why you need the double [[ and ]].




Am I misinterpreting the spec or is this indeed a problem with dash? In
addition to this issue, the following example cause a parsing error:

[ \t])

Both problems can be reproduced with bash.


[ \t] does not work because the space ends the word. Additionally, \t 
does not mean "tab", it just means a literal "t". You could use [\ \ ] 
instead, or ["  "], where in both cases the second blank is a literal tab.


Cheers,
Harald van Dijk


Re: [v2 PATCH] eval: Reset handler when entering a subshell

2019-03-03 Thread Harald van Dijk

On 03/03/2019 17:39, Jilles Tjoelker wrote:

On Sun, Mar 03, 2019 at 04:43:09PM +, Harald van Dijk wrote:

The effect of the command built-in is that "if the command_name is the same
as the name of one of the special built-in utilities, the special properties
in the enumerated list at the beginning of Special Built-In Utilities shall
not occur." This is ambiguous as to whether it is just about the special
properties associated with the . command, or whether it includes those
associated with the set command called indirectly, but it must be one or the
other. If it includes the set command, then the shell shall not exit, and
the correct output is a followed by b. If it does not include the set
command, then the shell shall exit, and the correct output is nothing. dash
outputs b, which is wrong in either case.


The way I interpret this is that any error while parsing or executing
the . or eval command(s) is a "special built-in utility error" which can
be "caught" using "command". Therefore, the correct output is an error
message about the invalid option (to stderr) followed by b. This is what
happens in FreeBSD sh (for quite a few years), ksh93 (20120801_2) and
mksh (56c).


That's how dash implements it too, but it plainly doesn't make sense 
based on POSIX's requirements. If the effect of "shall exit" does not 
occur, the result isn't "shall start exiting and then abort the exit", 
it's "shall not exit". In order to allow dash/FreeBSD sh's 
interpretation, POSIX would need to say something like "the special 
properties [...] shall be suppressed" rather than "[...] shall not occur".


Cheers,
Harald van Dijk


Re: [v2 PATCH] eval: Reset handler when entering a subshell

2019-03-03 Thread Harald van Dijk

On 03/03/2019 13:57, Herbert Xu wrote:

On Sat, Mar 02, 2019 at 01:28:44PM +, Harald van Dijk wrote:


That is *a* real bug here, not *the* real bug. This leaves the buggy
behaviour of the "command" built-in intact in the test case I included in
the message you replied to.


I don't quite understand.  Could you explain what is still buggy
after the following patch?


I gave this example in my previous message:

  command . /dev/stdin <Ordinarily, "set -o invalid" is a "special built-in utility error", for 
which the consequence is that a non-interactive shell "shall exit". If 
it weren't a special built-in utility, it would be an "other utility 
(not a special built-in) error", for which a non-interactive shell 
"shall not exit".


The effect of the command built-in is that "if the command_name is the 
same as the name of one of the special built-in utilities, the special 
properties in the enumerated list at the beginning of Special Built-In 
Utilities shall not occur." This is ambiguous as to whether it is just 
about the special properties associated with the . command, or whether 
it includes those associated with the set command called indirectly, but 
it must be one or the other. If it includes the set command, then the 
shell shall not exit, and the correct output is a followed by b. If it 
does not include the set command, then the shell shall exit, and the 
correct output is nothing. dash outputs b, which is wrong in either case.


Cheers,
Harald van Dijk


Re: [PATCH] shell: Reset handler when entering a subshell

2019-03-02 Thread Harald van Dijk

On 28/02/2019 06:27, Herbert Xu wrote:

Thanks for the analysis.  I think the real bug here is the fact
that subshells can "escape" their jail by a longjmp that is caught
by evalcommand.


That is *a* real bug here, not *the* real bug. This leaves the buggy 
behaviour of the "command" built-in intact in the test case I included 
in the message you replied to.


For fixing the one bug, it is a sensible approach, but keep in mind that 
when a fork is omitted because of EV_EXIT, so too will the reset of 
handler. You may be able to get away with that as long as you do not 
propagate EV_EXIT in any cases where a cleanup action might cause problems.


Cheers,
Harald van Dijk


Re: [PATCH v3] redir: Handle nested exec within REALLY_CLOSED redirection

2019-01-16 Thread Harald van Dijk

On 16/01/2019 21:53, Martijn Dekker wrote:

Op 16-01-19 om 22:39 schreef Martijn Dekker:

Op 16-01-19 om 14:32 schreef Herbert Xu:

Thanks for the patch.  I took a deeper look into the history of
the bug and it turned out that I added REALLY_CLOSED as an
optimisation in order to avoid an unnecessary close(2) syscall.


Does this actually save cycles? I'm probably missing something, but that
code looks to me like it probably uses more cycles than close(2) going
'descriptor not open, return'.


Never mind, stupid question that I should have googled before asking it.
The answer is that a switch to kernel mode and back is expensive.


Still, if that's slow enough and happens commonly enough that it's worth 
avoiding, it seems like it would still be simpler, shorter and probably 
faster to just write


  if (fd != -1)
close(fd);

in a few places, since we know that the only invalid fd that ever gets 
passed to close() is -1. That avoids the other cases where dash already 
happily calls close(-1) as well.


Cheers,
Harald van Dijk


Re: [BUG] dash hangs on 'eval' syntax error in dot script

2019-01-11 Thread Harald van Dijk

On 10/01/2019 14:21, Martijn Dekker wrote:

Op 10-01-19 om 11:37 schreef Martijn Dekker:

In a dot script sourced with 'command .' (which is useful to avoid
exiting if the script doesn't exist), triggering a syntax error in an
'eval' in a subshell causes dash to hang at the end of the main script.


In fact, 'eval' doesn't appear related. I can also trigger the bug by
triggering an error in another special builtin:

command . /dev/stdin <

I do not see a hang myself, but I definitely see wrong behaviour: the 
output I get is


  $ command . /dev/stdin < ( set -o foobar ) && echo WOOPS
  > EOF
  src/dash: 1: set: Illegal option -o foobar
  $ echo end
  end
  $ 
  WOOPS
  $

This was introduced by

  commit 46d5a7fcea81b489819f753451c1ad2fe435f148
  Author: Herbert Xu 
  Date:   Tue Mar 27 00:39:35 2018 +0800

eval: Restore input files in evalcommand

When evalcommand invokes a command that modifies parsefile and
then bails out without popping the file, we need to ensure the
input file is restored so that the shell can continue to execute.

What I think it going on here, although I do not understand it 
completely yet, is:


evalcommand invokes a command that modifies parsefile: that's the dot 
command. The evaluation of the dotted file involves creating a subshell, 
and because of the invalid option, that subshell exits with EXERROR. 
Because the subshell is invoked from within the command builtin, that 
EXERROR is eaten, and the input file is restored. The subshell then 
happily continues reading commands at the same time as the parent shell.


Although this particular example used to work before that specific 
commit, I suspect the underlying problem had been there already and 
other examples could expose it just as well.


Fundamentally, I think an important question to ask is whether

  command . /dev/stdin <is supposed to abort the shell because of the invalid option, and if 
not, whether it should print a and b, or only b. Personally, I think 
there is nothing in POSIX to suggest that only b gets printed: either 
the special property of special built-in utilities that they cause the 
shell to exit on error apply, in which case nothing gets printed, or 
they do not apply, in which both a and b get printed. Nonetheless, 
printing b is the most popular result:


bash/yash/zsh print nothing.
dash/ksh/mksh/pdksh/posh print b.
bosh prints a and b.

My personal feeling is that "print nothing" is the correct result: the 
command built-in should only cause the immediately invoked command to 
not be treated as a special built-in utility, not anything indirectly 
invoked by that.


Cheers,
Harald van Dijk


Re: [BUG] 'fc -s' infinite loop

2019-01-09 Thread Harald van Dijk

On 01/01/2019 16:24, Harald van Dijk wrote:

On 01/01/2019 14:27, Martijn Dekker wrote:
On dash and on gwsh, the 'fc -s' command, that re-executes a command 
from the history, executes the command repeatedly in an infinite loop. 
It should execute it just once.


This is caused by the block with the XXX comment in src/histedit.c:


    if (displayhist && hist) {
    /*
 *  XXX what about recursive and
 *  relative histnums.
 */
    history(hist, , H_ENTER, s);
    }


This changes the position in the history list and also clobbers he, so 
the if (he.num == last) check afterwards always returns false, and if 
the check was supposed to return false, the next history(hist, , 
direction) would not do what was intended.


The immediate problem can be fixed either by dropping support for the 
non-standard and undocumented fc -s first last and immediately breaking 
out of the loop, or by restoring the position (restoring he at the same 
time).


Keeping fc -s first last working is pretty much not an option without 
modifying it to generate a full list of commands to execute first. 
Otherwise, the comment



/*
 * At end?  (if we were to lose last, we'd sure be
 * messed up).
 */


is sure to hit: executing the new commands may clear out previous 
history entries, including last. fc -s first does not have that problem 
since it only takes a single command -- except that it should enter the 
command into the history before executing it, not after it, at which 
point you can get that problem after all. However...


But there are more problems: H_ENTER adds a new command to the history. 
bash and ksh do not do this:


   echo hello
   fc -s 1
   fc -l

will show (bash, but ksh is similar)

   hello
   1    echo hello
   2    echo hello

That is, the echo hello command appears twice in the history, but the fc 
-s 1 command does not appear at all. This is what POSIX probably[*] 
requires as well.


...modifying fc to use H_REPLACE nicely avoids that problem. Since no 
new history entry is created, no old entry can be cleared out.


By the time fc is called, the fc line is already in the history, so to 
get the bash/ksh behaviour, the current command would need to be 
overwritten. The documentation of libedit does not show any option which 
can overwrite entries already created. It documents H_ADD and H_APPEND, 
which can append, and has an undocumented H_REPLACE used by the equally 
undocumented replace_history_entry() function in . 
I do not know yet if it makes sense to start using this.


Having thought about it, I think it does make sense, and have looked at 
how to use H_REPLACE. It takes a char * and a void *, where the char * 
specifies the new string for the currently selected history entry, and 
the void * is application-specific data. Since no application-specific 
data is ever used, it can simply be hardcoded to (void *) NULL, making 
sure to use a cast because history() is a variadic function.


Since H_REPLACE operates on the currently selected history entry, it 
requires H_FIRST to re-select the most-recent history entry.


A corner case is when one command line consists of multiple fc commands:

  echo abc
  echo def
  fc -s 1; fc -s 2

Here, bash and ksh let the last fc -s command "win": after this, the 
history is


  echo abc
  echo def
  echo def

but the output is

  abc
  def
  abc
  def

Using H_REPLACE replicates that.

Also, when neither -s nor -l is specified and an editor is invoked, the 
new command is supposed to be entered into the history as well. That too 
is not implemented.


This can be implemented by modifying the parser to allow creating 
history entries for files pushed back later. Currently, the check is 
limited to stdin:



if (parsefile->fd == 0 && hist && something)


but keeping track of whether the history should be saved for each file, 
and then turning it on when fc without -s/-l is used, lets this work.


Other bugs:

- fc without arguments is perfectly valid and should not raise an error.

- The extension to allow negative numeric arguments without -- should 
work on BSD but does not work on glibc, because optind is set to 0 
rather than 1.


- Commands containing blank lines are entered into the history without 
the blank lines. Entirely blank commands should not be entered in the 
history, but


  echo "hello

  world"

should not enter the history as

  echo "hello
  world"

since that has a different effect.

- fc -s's old=new option should not be accepted and ignored when -s is 
not specified. old=new is supposed to be interpreted as a search string 
in that case.


- When an out-of-range numbe

Re: [Bug report] Incorrect handling of backslashes in read -r

2019-01-04 Thread Harald van Dijk

On 04/01/2019 13:20, Peter Wu wrote:

Hi,

Dash 'read' builtin with the '-r' option is not POSIX-compliant:

 $ printf 'omg\\bar\\x41--\n' > input
 $ dash -c 'read -r x < input; cat input; echo "$x"'
 omg\bar\x41-\\-
 omar\x41-\-

The outputs are expected to be equal (which is the case with bash).
Originally found in dash 0.5.8-2.10 (Ubuntu 18.04), but also reproduced
with dash 0.5.10.2-1 (Arch Linux).


This isn't read giving the backslashes special treatment, it's echo. Use 
printf "%s\n" "$x" instead and you will get the same output.


Cheers,
Harald van Dijk


Re: [BUG] 'fc -s' infinite loop

2019-01-01 Thread Harald van Dijk

On 01/01/2019 18:59, Martijn Dekker wrote:

Op 01-01-19 om 17:24 schreef Harald van Dijk:
The immediate problem can be fixed either by dropping support for the 
non-standard and undocumented fc -s [...]


Actually, it's standard:


I didn't say fc -s was non-standard, I said fc -s first last was 
non-standard. POSIX does not allow specifying last in combination with -s.


Cheers,
Harald van Dijk


Re: [v2 PATCH] eval: Only restore exit status on exit/return

2018-12-14 Thread Harald van Dijk

On 14/12/2018 10:07, Harald van Dijk wrote:

On 14/12/2018 09:47, Herbert Xu wrote:

On Fri, Dec 14, 2018 at 09:37:09AM +, Harald van Dijk wrote:


Still, that works as well, doesn't it? The control flow is basically the
same so the logic in my other message applies. returncmd() doesn't use
exceptions, so you get to dotrap() which already resets exitstatus if
necessary.


returncmd itself doesn't jump up but it may be part of a subshell
which would execute a jump to the top as part of the EV_EXIT
optimisation.  That's where you'd need to restore the exit status.


Ah, okay, so for something like

   trap 'false; (return); echo $?' EXIT

you want to remember that it's executed as part of a trap action and 
print 0, not 1.


I actually want that to print 1, so for me it's not a problem. However, 
my patch can be trivially modified to handle that: just have returncmd() 
check savestatus the same way exitcmd() does. I think that also allows 
merging SKIPFUNC and SKIPFUNCDEF, since you can then assume exitstatus 
was always set appropriately, so it allows for some further reduction in 
code.


I think this is needed regardless to fix a bug as well:

  trap 'f() { false; return; }; f; echo $?' EXIT

Other shells are evenly divided, but POSIX clearly requires this to 
print 0 (which is what bash/ksh/mksh/yash do): calling a function does 
not reset any trap information, so this return statement must be 
considered to execute in a trap action. It currently prints 1 (also seen 
in bosh/pdksh/posh/zsh). With my suggested changes, it prints 0.


Re: [v2 PATCH] eval: Only restore exit status on exit/return

2018-12-14 Thread Harald van Dijk

On 14/12/2018 09:47, Herbert Xu wrote:

On Fri, Dec 14, 2018 at 09:37:09AM +, Harald van Dijk wrote:


Still, that works as well, doesn't it? The control flow is basically the
same so the logic in my other message applies. returncmd() doesn't use
exceptions, so you get to dotrap() which already resets exitstatus if
necessary.


returncmd itself doesn't jump up but it may be part of a subshell
which would execute a jump to the top as part of the EV_EXIT
optimisation.  That's where you'd need to restore the exit status.


Ah, okay, so for something like

  trap 'false; (return); echo $?' EXIT

you want to remember that it's executed as part of a trap action and 
print 0, not 1.


I actually want that to print 1, so for me it's not a problem. However, 
my patch can be trivially modified to handle that: just have returncmd() 
check savestatus the same way exitcmd() does. I think that also allows 
merging SKIPFUNC and SKIPFUNCDEF, since you can then assume exitstatus 
was always set appropriately, so it allows for some further reduction in 
code.


Re: [v2 PATCH] eval: Only restore exit status on exit/return

2018-12-14 Thread Harald van Dijk

On 14/12/2018 08:02, Harald van Dijk wrote:

On 14/12/2018 03:10, Herbert Xu wrote:

On Sat, Dec 08, 2018 at 03:45:11PM +, Harald van Dijk wrote:


--- a/src/eval.c
+++ b/src/eval.c
@@ -116,10 +116,7 @@ INCLUDE "eval.h"
  EXITRESET {
  evalskip = 0;
  loopnest = 0;
-    if (savestatus >= 0) {
-    exitstatus = savestatus;
-    savestatus = -1;
-    }
+    savestatus = -1;
  }
  #endif


The reason this is needed is to support a naked return.  With
your patch a naked return would either have to always return the
saved status or the new status.  Neither of which is what we
want.


I actually saw the commit you referenced already and tested that with my 
patch before sending. This is how I tested it:


Returns 1:

   f() {
     false; return
   }
   f

Returns 0:

   f() {
     trap "false; return" USR1
     kill -USR1 $$
   }
   f

The former picks up the status of the false command, the latter of the 
kill command. This works because although returncmd() only looks at 
exitstatus, so returns the wrong value, in the no-argument version it 
sets skip to SKIPFUNCDEF, causing dotrap() to restore the original 
exitstatus value, whereas in the version that does take arguments, skip 
is set to SKIPFUNC, causing dotrap() to leave exitstatus alone.


Which is exactly how that particular commit was supposed to work in the 
first place, so perhaps it's simpler to put it as "in these tests, 
changes to exitreset() do not cause the test to break because 
exitreset() is never called at the wrong time".


If there's some test that breaks, it would need to be one where the 
original exitstatus needs to be restored by exitreset(), *and* the 
original exitstatus is not already restored by dotrap().


The original exitstatus is not restored by dotrap() only if evalskip == 
SKIPFUNC (i.e. return with argument), in which case it shouldn't be 
restored, or it exits with an exception. And exitreset() is only called 
when control gets back to main() via an exception.


So you'd need an example where an exception is raised after the return 
command successfully completes without arguments, yet before the 
function the return command is in (if any) actually returns, where that 
exception would then reach main(). I struggle to come up with such an 
example.


Re: [v2 PATCH] eval: Only restore exit status on exit/return

2018-12-14 Thread Harald van Dijk

On 14/12/2018 03:10, Herbert Xu wrote:

On Sat, Dec 08, 2018 at 03:45:11PM +, Harald van Dijk wrote:


--- a/src/eval.c
+++ b/src/eval.c
@@ -116,10 +116,7 @@ INCLUDE "eval.h"
  EXITRESET {
evalskip = 0;
loopnest = 0;
-   if (savestatus >= 0) {
-   exitstatus = savestatus;
-   savestatus = -1;
-   }
+   savestatus = -1;
  }
  #endif


The reason this is needed is to support a naked return.  With
your patch a naked return would either have to always return the
saved status or the new status.  Neither of which is what we
want.


I actually saw the commit you referenced already and tested that with my 
patch before sending. This is how I tested it:


Returns 1:

  f() {
false; return
  }
  f

Returns 0:

  f() {
trap "false; return" USR1
kill -USR1 $$
  }
  f

The former picks up the status of the false command, the latter of the 
kill command. This works because although returncmd() only looks at 
exitstatus, so returns the wrong value, in the no-argument version it 
sets skip to SKIPFUNCDEF, causing dotrap() to restore the original 
exitstatus value, whereas in the version that does take arguments, skip 
is set to SKIPFUNC, causing dotrap() to leave exitstatus alone.


Re: [v2 PATCH] eval: Only restore exit status on exit/return

2018-12-08 Thread Harald van Dijk

On 06/12/2018 21:35, Harald van Dijk wrote:

On 04/12/2018 23:57, Harald van Dijk wrote:
This has the benefit of fixing one other test case, a small 
modification from one of Martijn Dekker's:


   $SHELL -c 'trap "set -o bad@option" INT; kill -s INT $$' && echo BUG


Another test case, one that still fails:

   trap exit INT
   trap 'true; kill -s INT $$' EXIT
   false

Here, inside the EXIT handler, "the command that executed immediately 
preceding the trap action" is `false`, but inside the INT handler, it's 
either `true` or `kill -s INT $$` (I think the latter, but it doesn't 
matter). dash treats it as if it were still `false`.


The attached patch fixes this. In short, it assumes that if the EXIT 
action raises an exception, exitstatus will have been set appropriately, 
and modifies exitcmd() to set it. It also simplifies dotrap() by 
removing the unnecessary special handling of recursive calls.


It handles the following test cases, from this thread:

exec 2>/dev/null
$SHELL -c 'trap "(false) && echo BUG1" INT; kill -s INT $$'
$SHELL -c 'trap "(false) && echo BUG2" EXIT'
$SHELL -c 'trap "(false); echo \$?" EXIT'
$SHELL -c 'trap "(true) || echo BUG3" INT; false; kill -s INT $$'
$SHELL -c 'trap "(true) || echo BUG4" EXIT; false'
$SHELL -c 'trap "(true); echo \$?" EXIT; false'
$SHELL -c 'trap "(! :) && echo BUG5" EXIT'
$SHELL -c 'trap "(false) && echo BUG6" EXIT'
$SHELL -c 'trap "readonly foo=bar; (foo=baz) && echo BUG7" EXIT'
$SHELL -c 'trap "(set -o bad@option) && echo BUG8" EXIT'
$SHELL -c 'trap "set -o bad@option" EXIT' && echo BUG9
$SHELL -c 'trap "set -o bad@option" INT; kill -s INT $$' && echo BUG10
$SHELL -c 'trap exit INT; trap "true; kill -s INT $$" EXIT; false' || 
echo BUG11


It does not change the behaviour for these test cases, also from this 
thread:


$SHELL -c 'trap "(trap \"echo exit\" EXIT; :)" EXIT'
$SHELL -c 'trap "(:; exit) && echo exit" EXIT; false'

It can be combined with the other patches to also change the behaviour 
of those two.
--- a/src/eval.c
+++ b/src/eval.c
@@ -116,10 +116,7 @@ INCLUDE "eval.h"
 EXITRESET {
 	evalskip = 0;
 	loopnest = 0;
-	if (savestatus >= 0) {
-		exitstatus = savestatus;
-		savestatus = -1;
-	}
+	savestatus = -1;
 }
 #endif
 
--- a/src/main.c
+++ b/src/main.c
@@ -348,7 +348,9 @@ exitcmd(int argc, char **argv)
 		return 0;
 
 	if (argc > 1)
-		savestatus = number(argv[1]);
+		exitstatus = number(argv[1]);
+	else if (savestatus >= 0)
+		exitstatus = savestatus;
 
 	exraise(EXEXIT);
 	/* NOTREACHED */
--- a/src/trap.c
+++ b/src/trap.c
@@ -320,17 +320,14 @@ void dotrap(void)
 	char *p;
 	char *q;
 	int i;
-	int status, last_status;
+	int status;
 
 	if (!pending_sig)
 		return;
 
 	status = savestatus;
-	last_status = status;
-	if (likely(status < 0)) {
-		status = exitstatus;
-		savestatus = status;
-	}
+	savestatus = exitstatus;
+
 	pending_sig = 0;
 	barrier();
 
@@ -350,10 +347,10 @@ void dotrap(void)
 			continue;
 		evalstring(p, 0);
 		if (evalskip != SKIPFUNC)
-			exitstatus = status;
+			exitstatus = savestatus;
 	}
 
-	savestatus = last_status;
+	savestatus = status;
 }
 
 
@@ -390,8 +387,10 @@ exitshell(void)
 
 	savestatus = exitstatus;
 	TRACE(("pid %d, exitshell(%d)\n", getpid(), savestatus));
-	if (setjmp(loc.loc))
+	if (setjmp(loc.loc)) {
+		savestatus = exitstatus;
 		goto out;
+	}
 	handler = 
 	if ((p = trap[0])) {
 		trap[0] = NULL;


Re: [PATCH] clear_traps: reset savestatus

2018-11-29 Thread Harald van Dijk

On 29/11/2018 20:28, Martijn Dekker wrote:

Op 29-11-18 om 20:56 schreef Harald van Dijk:
That's part of it, but not the whole story. Herbert Xu's comment about 
exitshell() was right, that is still a problem. A testcase for this is


   trap '(true) || echo bug' EXIT


Yes. Thanks. I hadn't thought about that.

The test case above is not quite complete. It needs to make sure the 
exit status is non-zero before executing the trap:


$ src/dash -c "trap '(true) || echo bug' EXIT; false"
bug


I had intended it as a testcase for dash with your change applied, in 
which case it fails even without making sure of that since you hit 
_exit(savestatus) with savestatus reset to -1, but I like that it's so 
easy to modify to also fail on unpatched dash, so thanks for that.


By the way, my change has an unintended but possibly acceptable side effect:

  trap '(trap "echo exit" EXIT; :)' EXIT

This prints nothing with current dash, but prints "exit" with my change. 
It also prints "exit" in ksh, mksh, posh, and bosh.



- Martijn



Re: [PATCH] clear_traps: reset savestatus

2018-11-29 Thread Harald van Dijk

On 29/11/2018 15:45, Martijn Dekker wrote:

Op 27-11-18 om 17:24 schreef Martijn Dekker:

Big bad bug: it appears that subshells always return status 0 in traps.


As posted elsewhere, looks like the problem is simply that savestatus 
("/* exit status of last command outside traps */") isn't reset to -1 
upon resetting traps when forking a subshell.


That's part of it, but not the whole story. Herbert Xu's comment about 
exitshell() was right, that is still a problem. A testcase for this is


  trap '(true) || echo bug' EXIT

Your patch can be extended to handle that though, by skipping 
exitshell()'s handler if savestatus got reset to -1.


Cheers,
Harald van Dijk
diff --git a/src/trap.c b/src/trap.c
index ab0ecd46..eef44f1d 100644
--- a/src/trap.c
+++ b/src/trap.c
@@ -169,6 +169,7 @@ clear_traps(void)
 	}
 	trapcnt = 0;
 	INTON;
+	savestatus = -1;
 }
 
 
@@ -387,11 +388,18 @@ exitshell(void)
 {
 	struct jmploc loc;
 	char *p;
+	struct jmploc *volatile savehandler;
 
 	savestatus = exitstatus;
 	TRACE(("pid %d, exitshell(%d)\n", getpid(), savestatus));
-	if (setjmp(loc.loc))
+	savehandler = handler;
+	if (setjmp(loc.loc)) {
+		if (savestatus == -1) {
+			handler = savehandler;
+			longjmp(handler->loc, 1);
+		}
 		goto out;
+	}
 	handler = 
 	if ((p = trap[0])) {
 		trap[0] = NULL;


Re: [PATCH] var: ensure variables are fully initialised when unset

2018-11-12 Thread Harald van Dijk

On 12/11/2018 12:53, Ron Yorston wrote:

When a variable is unset by calling setvar(name, 0, 0) the code
to initialise the new, empty variable omits the trailing '='.


It's supposed to. A trailing = means the variable is set to an empty 
string. That's different from unset. You can see the difference with

set -u, or with ${var+set}. However, ...


Attempts to read the contents of the unset variable will result
in the uninitialised character at the end of the string being
accessed.


...this is indeed a bug which I've noticed as well. The code needs two 
trailing null bytes, not just one. Because of glibc internals, the 
out-of-bounds byte being read will almost certainly be zero on x86-64, 
but it's not a guarantee, and it could probably break more easily on 
other platforms.


It only affects shell-internal uses of variables, only for variables 
explicitly unset by a script (rather than unset by default), only for 
uses where the code does not explicitly check for unset beforehand. As 
far as scripts go, that just means PATH (as you found) I think, for 
interactive shells there are some more variables such as PS1/PS2/PS4/MAIL.


My patch is attached.

Cheers,
Harald van Dijk
diff --git a/src/var.c b/src/var.c
index 0d7e1db0..9163cca9 100644
--- a/src/var.c
+++ b/src/var.c
@@ -206,9 +206,9 @@ struct var *setvar(const char *name, const char *val, int flags)
 		vallen = strlen(val);
 	}
 	INTOFF;
-	p = mempcpy(nameeq = ckmalloc(namelen + vallen + 2), name, namelen);
+   p = mempcpy(nameeq = ckmalloc(namelen + vallen + 2), name, namelen + 1);
 	if (val) {
-		*p++ = '=';
+   p[-1] = '=';
 		p = mempcpy(p, val, vallen);
 	}
 	*p = '\0';


Re: Negative LINENO possible with indirection

2018-09-18 Thread Harald van Dijk

On 17/09/2018 23:46, Max Rees wrote:

Hello,

While running the shunit2 test suite[1][2] against dash, the following
problem presented itself: when multiple layers of indirection are used,
$LINENO yields a negative number. It appears that the root of the
problem is that lineno is decremented in multiple places in src/eval.c
with no lower bound, but I wasn't sure how to fix that. A minimal test
case follows below.


Hi,

A more minimal test case is

  :
  :
  f() { eval echo \$LINENO; }
  f

When this script is parsed, the function definition gets an associated 
line number 3, and the eval command invocation also gets an associated 
line number 3. POSIX says



LINENO
Set by the shell to a decimal number representing the current sequential 
line number (numbered starting with 1) within a script or function before it 
executes each command. If the user unsets or resets LINENO, the variable may 
lose its special meaning for the life of the shell. If the shell is not 
currently executing a script or function, the value of LINENO is unspecified. 
This volume of POSIX.1-2017 specifies the effects of the variable only for 
systems supporting the User Portability Utilities option.


I have interpreted this as saying that when a script defines a function, 
inside that function, lines are numbered starting from 1 again. So when 
the eval command is executed, LINENO should be set to 1. This is 
achieved by subtracting the function definition's line number (less one).


This is also how zsh has interpreted the spec. However, most other 
shells agree with bash, which says:



x.  The expansion of $LINENO inside a shell function is only relative to the
function start if the shell is interactive -- if the shell is running a
script, $LINENO expands to the line number in the script.  This is as
POSIX-2001 requires.


Continuing with the original interpretation:

When the eval command executes, the echo command is parsed in isolation. 
It gets an associated line number 1. But it is executed in the context 
of the function f, so f's associated line number is taken off again. The 
result is 1 - (3 - 1) == -1.


There are multiple possible approaches to change this behaviour, 
depending on the intended behaviour.


1: Function definitions could be ignored for LINENO purposes if the 
function definitions are in a script. Since that takes out the 
subtraction, it would at least get rid of the negative numbers.
2: Function definitions' commands could be ignored for LINENO purposes. 
This preserves the outer-most command's line number. Since that also 
takes out the subtraction, it would at least get rid of the negative 
numbers.
3: eval's commands could be executed as if they are not in the context 
of a function. This would mean that eval 'echo $LINENO' always prints 1.
4: eval's commands could be executed as if they are in the context of a 
function with a faked associated line number, so that echo $LINENO and 
eval 'echo $LINENO' always print the same number, but eval 'echo $LINENO

echo $LINENO' prints two different line numbers.
5: The commands eval sees after parsing could get a fixed associated 
line number, so that eval 'echo $LINENO

echo $LINENO' always prints '0 0' or '1 1'.
6: The commands eval sees after parsing could get eval's associated line 
number, so that echo $LINENO and eval 'echo $LINENO' always print the 
same number, and eval 'echo $LINENO
echo $LINENO' prints the same line number twice. As long as the user 
never modifies LINENO, this can also be equivalently stated as: the 
commands eval sees after parsing could avoid setting LINENO.


Testing the behaviour of various shells shows, taking dash as the baseline:

bash  | 1 4
bosh  | 2 5
ksh   | 1 3
mksh  | 1 6
pdksh | 1 5
posh  | 1 5
yash  | 1 3
zsh   | 6

Personally, I am in favour of 1. It is more sensible behaviour and I 
only implemented it the way it is now because of what POSIX said. If a 
different, equally reasonable interpretation can get a more desirable 
outcome, I would suggest going with that interpretation. It involves a 
bit more than just taking out the subtractions: it needs a bit of 
thinking to figure out how to ensure LINENO does get set properly in 
functions in interactive shells.


As for the rest, unless there are strong reasons to go after a 
particular result, since POSIX says the results are unspecified, I would 
suggest going with whatever is the simplest to implement.


Cheers,
Harald van Dijk


Please CC me on replies - I'm not subscribed.

Max


Re: [BUG] failure to push/restore closed file descriptor

2018-09-12 Thread Harald van Dijk

On 23/04/2018 19:56, Martijn Dekker wrote:

$ dash -c '{ exec 8Apparently, dash either fails to push the file descriptor onto the stack 
at '} 8<&-', or fails to restore it.


Same bug with loops ending in "done 8<&-".

Confirmed in all dash versions down to 0.5.5.1.


What surprises me most is that dash has code written specifically to 
keep the fd closed. dash would be smaller and simpler if it behaved the 
way you expected and the way most other shells behave: just remove all 
traces of REALLY_CLOSED.


FWIW, this also happens with n< which is ignored entirely. The 
behaviour I would expect for that (and which I have implemented, but 
which I am pretty sure has no chance anyway of getting into dash) is to 
issue an error message if that fd is not open, and to save and later 
restore the fd if it is open.


Test cases:

  : 8<&8

Assuming fd 8 does not happen to be open already, bash and dash are the 
only shells I've tested which accept this. ksh93, mksh, pdksh, yash, 
zsh, bosh all reject it.


  echo ok | { { exec 0<&-; } 0<&0; cat; }

Here, bash and dash are in agreement again, but this time they're not 
alone: mksh and posh also cause an error message by leaving stdin 
closed. The other shells restore stdin and print ok.


There is one more corner case on the subject of redirections, which 
behaves unexpectedly in two shells in two different ways:


  echo ok | cat 0>&0

Almost all shells are okay with this. The exceptions are bosh and yash: 
yash rejects the redirection as fd 0 is not writeable, bosh appears to 
mishandle all 0>* redirections, instead treating them as 1>*.


Cheers,
Harald van Dijk


Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 3/29/18 6:42 AM, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 03:06:32PM +0200, Harald van Dijk wrote:



Since bash itself is inconsistent, and POSIX unclear,

What exactly is unclear about the sentence of POSIX that I quoted? Is there
a legitimate interpretation of "It is unspecified whether A or B" that
allows other behaviour than A or B?


The tricky part is coming up with a way to actually produce an
unescaped backslash.


You're right that what other shells implement doesn't allow any 
possibility of unescaped backslashes in the shell. But if what they do 
is what's intended, the sentence still makes perfect sense to me: it 
simply wouldn't affect the shell. Remember that the description of 
pattern matching applies to both the shell and the non-shell cases. For 
the non-shell cases, it's trivial, and it's possible that that was all 
it was intended to address:


  find . -name 'a\'

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 13:00, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 12:53:31PM +0200, Harald van Dijk wrote:


[un-snip]

  If a pattern ends with an unescaped , it is unspecified whether the pattern does not match anything or the pattern is treated as invalid. 



I don't think this allows dash's behaviour of taking the backslash as a
literal, since that still allows a match to succeed. bash lets such a
pattern never match. In other shells, there is no way to get into this
situation, but GNU find behaves the same as bash.


Nope, bash treats it as if the backslash is not there.

$ pwd
/home/dash
$ touch asdf\\
$ touch asdf
$ bash -c 'v="../da*sh/asdf\\"; printf "%s\n" $v'
../dash/asdf
$


Huh. Indeed in that case, but at the same time:

  bash -c 'v="?sdf\\"; printf "%s\n" $v'

Here, it will not be considered to match, this just prints ?sdf\. I 
suspect bash has the same kind of logic as dash, where it just tries to 
lstat() the file if the final pathname component contains no 
metacharacters or goes through an opendir()/readdir()/closedir() 
sequence if it does, and mishandles this corner case.


[...]

Since bash itself is inconsistent, and POSIX unclear,
What exactly is unclear about the sentence of POSIX that I quoted? Is 
there a legitimate interpretation of "It is unspecified whether A or B" 
that allows other behaviour than A or B?


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 12:37, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 12:03:24PM +0200, Harald van Dijk wrote:


When expanded backslashes are no longer treated as quoted, this would call
expmeta() with the pattern *\, that is with a single unquoted trailing
backslash, so:
[...]


Thanks for the explanation.  Here is an updated patch.


That seems consistent with what pmatch() does for trailing unquoted 
backslashes, but actually, I think pmatch() is wrong on that too. POSIX 
says:


  If a pattern ends with an unescaped , it is unspecified 
whether the pattern does not match anything or the pattern is treated as 
invalid.


I don't think this allows dash's behaviour of taking the backslash as a 
literal, since that still allows a match to succeed. bash lets such a 
pattern never match. In other shells, there is no way to get into this 
situation, but GNU find behaves the same as bash.


Test case:

  v=\\; case \\ in $v) echo bug;; esac

bash prints nothing, dash 0.5.9.1 prints bug. Other shells don't count 
since they interpret $v differently.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 11:52, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 08:44:28AM +0200, Harald van Dijk wrote:


Test case:

   $v='*\'
   set -- $v


I don't see how this would cause an overrun, can you please explain
it for me?


Line numbers are from 0.5.9.1.

When expanded backslashes are no longer treated as quoted, this would 
call expmeta() with the pattern *\, that is with a single unquoted 
trailing backslash, so:


expand.c:1333

if (*p == '\\')
esc++;
if (p[esc] == '/') {

The first if statement will be hit and set esc to 1. p[esc] is then 
'\0', so the second if block doesn't get entered and the outer loop 
continues:


expand.c:1315

for (p = name; esc = 0, *p; p += esc + 1) {

p += esc + 1 will increase p by 2, letting it point just past the 
terminating '\0'. The loop condition of *p now accesses the byte just 
past the pattern.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 10:16, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 09:38:04AM +0200, Harald van Dijk wrote:


Also, it's just as logical to restore the case pattern matching to its
traditional behaviour to align it with pathname expansion.


No I think that makes no sense and clearly contradicts POSIX.


That's not clear at all. As I already wrote earlier and Jilles Tjoelker 
chimed in with as well, if there isn't a single shell out there that 
actually implements what POSIX specifies (bash gets close, but isn't 
fully there either), not even the shells on which the POSIX 
specification was based, there's a fair chance it's a POSIX defect, 
regardless of whether it makes sense.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 09:31, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 08:52:57AM +0200, Harald van Dijk wrote:


I seriously cannot understand the logic of pushing a break from traditional
ash behaviour and from all shells apart from bash, giving POSIX as a reason
for doing it, and then giving the behaviour of all those other shells as a
reason for not doing it properly.


It's logical for dash because this aligns the behaviour of pathname
expansion with case pattern matching.


Except it doesn't actually do that. It does in some cases, and not in 
others. You've made it clear that you don't care about the cases where 
it doesn't. That can be a valid decision, but it's one you should 
acknowledge.


Also, it's just as logical to restore the case pattern matching to its 
traditional behaviour to align it with pathname expansion.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 08:18, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 08:38:01PM +0200, Harald van Dijk wrote:



My inclination is to just drop the /d\ev issue and use the bash
behaviour until such a time that bash changes or POSIX changes.


What would need to change in POSIX to convince you to change dash to match
what you were already arguing is the POSIX-mandated behaviour? :)


No the passage I quoted only mandates that backslashes be
interpreted when doing the matching.  It doesn't say anything
about whether it should be removed after matching is done.  The
only thing it does say is that:

If the pattern does not match any existing filenames or pathnames,
the pattern string shall be left unchanged.

IOW it's silent in the case where the pattern does match.


Of course it doesn't say whether characters in the pattern are preserved 
in the case of a match. That's because when there's a match, pathname 
expansion produces the file name, not the original pattern.


In a directory containing foo.txt, you wouldn't argue that *.txt might 
expand to *foo.txt just because POSIX doesn't explicitly say the * is 
removed, would you?



However, the other reason this is not worth fixing is because all
those other shells that always treat the backslash as a literal
won't remove it in this case anyway.


I seriously cannot understand the logic of pushing a break from 
traditional ash behaviour and from all shells apart from bash, giving 
POSIX as a reason for doing it, and then giving the behaviour of all 
those other shells as a reason for not doing it properly.


If all those other shells are a reason against doing it, then why isn't 
that a reason for just restoring the dash 0.5.4 behaviour that all those 
other shells implement, always taking expanded backslash as effectively 
quoted?



  So nor sensible script could
possibly use such a feature even if we implemented it.


No portable script could use such a feature. A sensible script certainly 
could. There are a lot of scripts that aren't meant to be portable 
across shells. I know I've written scripts for my own use that use dash 
features that don't work the same in bash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-28 Thread Harald van Dijk

On 28/03/2018 08:23, Herbert Xu wrote:

On Wed, Mar 28, 2018 at 12:19:17AM +0200, Harald van Dijk wrote:


This introduces a buffer overread. When expmeta() sees a backslash, it
assumes it can just skip the next character, assuming the next character is
not a forward slash. By treating expanded backslashes as unquoted, it
becomes possible for the next character to be the terminating '\0'.


This code has always had to deal with naked backslashes.  Can you
show me the exact pattern that results in the overread?


No, it hasn't, because expmeta() is not used in case patterns, and case 
patterns are currently the only case where naked backslashes can appear. 
In contexts where pathname expansion is performed, a backslash coming 
from a variable will be escaped by another backslash in currently 
released dash versions.


Test case:

  $v='*\'
  set -- $v

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 3/27/18 10:55 AM, Herbert Xu wrote:

So going back to dash yes I think we should adopt the bash behaviour
for pathname expansion and keep the existing case semantics.

This patch does exactly that.  Note that this patch does not work
unless you have already applied

https://patchwork.kernel.org/patch/10306507/

because otherwise the optimisation mentioned above does not get
detected correctly and we will end up doing quote removal twice.


This introduces a buffer overread. When expmeta() sees a backslash, it 
assumes it can just skip the next character, assuming the next character 
is not a forward slash. By treating expanded backslashes as unquoted, it 
becomes possible for the next character to be the terminating '\0'.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 3/27/18 11:32 PM, Jilles Tjoelker wrote:

I don't think it is clear at all. Note the final paragraph of 2.13.1:

] When pattern matching is used where shell quote removal is not
] performed [...]

This implies that special characters cannot be escaped using a backslash
in a context where shell quote removal is performed.


Taken literally, it just says something about what happens when shell 
quote removal is not performed. In the cases we're talking about, shell 
quote removal is performed, so this simply wouldn't apply. Perhaps 
that's taking it more literally than intended, I don't know.



 Instead, special
characters can be escaped using shell quoting. As a result, the simplest
form of parameter expansion has either all or none of the generated
characters quoted (depending on whether the expansion is in
double-quotes or not).

There is also a sentence "The shell special characters always require
quoting." which seems untrue in practice if the characters come from an
expansion. Something like

   touch 'a'
   sh -c 'w=*\&*; printf "%s\n" $w'

works for many shells as sh. However, this could be explained away as
undefined behaviour.


This is what allows extensions to glob syntax, if those extensions use 
shell special characters.


  p="*(ab)"; case abab in $p) echo match ;; esac

This prints "match" in ksh93.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 20:24, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 08:01:10PM +0200, Harald van Dijk wrote:



Right.  I guess we could change it so that CTLESC is preserved
to distinguish between the backslashes from parameter expansion
vs. backslashes from open input.


Thinking about it some more, see below.


Actually let's not go down this route because this would mean
that the glob(3) path will never have the same functionality
since it cannot possibly see CTLESC.


Yes, that's why I ended up proposing what I put below, which does allow 
them the same functionality. Neither glob() nor expmeta() would see 
CTLESC, and neither would need to, since neither would need to optimise 
for the no-metachars case.



It was just the most basic command I could think of that shouldn't hit the
FS, currently doesn't hit the FS, and would start hitting the FS if
backslash gets taken as a metacharacter. Anything containing a quoted
metacharacter would do. I suppose I could have used echo "[  ok  ]" instead
for something that's more likely to pop up in a real script.


My inclination is to just drop the /d\ev issue and use the bash
behaviour until such a time that bash changes or POSIX changes.


What would need to change in POSIX to convince you to change dash to 
match what you were already arguing is the POSIX-mandated behaviour? :)


More serious response: I do think it's worth raising this as a bash bug 
too and seeing what they do.



It's such an unlikely case as why would anyone knowingly put
backslash into a variable and expecting pathname expansion to
remove it without actually using real magic characters.


That's true. The least unlikely scenario I can think of is a directory 
name containing [ but not ]. A script may end up escaping the [ in that 
case just as a precaution, even though it's not strictly needed, and 
glob() may return early due to GLOB_NOMAGIC picking up on the fact that 
[ is not magic in that context.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 19:02, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 06:48:45PM +0200, Harald van Dijk wrote:


Backslashes coming from parameters, sure, but backslashes introduced by
preglob(), I'm not so sure.


Right.  I guess we could change it so that CTLESC is preserved
to distinguish between the backslashes from parameter expansion
vs. backslashes from open input.


Thinking about it some more, see below.


The [ character would mark the whole string possibly-meta in expandmeta(),
and the CTLESC wouldn't be seen there. Then, preglob() turns it into \[, and
expmeta() wouldn't take [ as a meta character, but would take \ as a meta
character.


Oh I though you were talking about test/[, I didn't see the set.
But why would someone do set "["?


It was just the most basic command I could think of that shouldn't hit 
the FS, currently doesn't hit the FS, and would start hitting the FS if 
backslash gets taken as a metacharacter. Anything containing a quoted 
metacharacter would do. I suppose I could have used echo "[  ok  ]" 
instead for something that's more likely to pop up in a real script.



Then, how about moving some of expmeta()'s checks to return early outside of
it, so that they can also be done in the glob() case?


We could.


expandmeta() is implemented twice, once for the glob() case, once for 
the expmeta() case. There is not much code shared between them except 
for preglob(). preglob(), through _rmescapes(), already has to go over 
the whole string and check each character. If that can be made to return 
an indication somehow of whether it has seen metacharacters, that could 
be used for both cases. Perhaps an additional flag that, when set, lets 
it return NULL (and freeing the allocated string) if no metacharacters 
were seen? That means expmeta() doesn't need to learn about CTLESC, 
means there's no need to go over the string a second time in the glob() 
case to really turn CTLESC into \, and means GLOB_NOMAGIC becomes 
unnecessary and can be removed as a workaround for glibc bugs.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 18:04, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 05:54:53PM +0200, Harald van Dijk wrote:


I was thinking about not making backslashes set metaflag in expmeta(): when
the pathname component doesn't include *, ?, or [, but does include
backslashes, then the if (metaflag == 0) block could handle that as long as
it performs the lstat64() check unconditionally. There's no need to go
through the opendir()/readdir()/closedir() path for that case. Since
expmeta() is bypassed for words not containing any potentially-magic
characters, the impact might be small enough.


Honestly such backslashes should be rare enough that I wouldn't
bother with such an optimisation.


Backslashes coming from parameters, sure, but backslashes introduced by 
preglob(), I'm not so sure.



Regardless of whether metaflag is set, it would mean things like 'set "["'
would start hitting the FS unnecessarily, if I understand correctly: the
preglob()-expanded pattern is '\[', and expmeta() can no longer tell this
apart from $v where v='\[' where a FS check would be required.


No it shouldn't.  We only mark [ is meta if there is a matching ].


The [ character would mark the whole string possibly-meta in 
expandmeta(), and the CTLESC wouldn't be seen there. Then, preglob() 
turns it into \[, and expmeta() wouldn't take [ as a meta character, but 
would take \ as a meta character.



Perhaps preglob() should just be avoided, and expmeta() taught to respect
both '\\' and CTLESC. '\\' would be a metacharacter and require a FS hit,
CTLESC wouldn't require it. This would also avoid some memory allocations.


We need preglob for glob(3).  I want to minimise the amount of code
difference between the glob path and the expmeta path.


Then, how about moving some of expmeta()'s checks to return early 
outside of it, so that they can also be done in the glob() case?



 if (!strpbrk(str->text, metachars))
 goto nometa;

in expandmeta() is done before preglob() is called. Here, it is still not
necessary to include CTLESC.


We could move it after preglob.


Assuming backslash is added to metachars, which was needed anyway:

Regardless of whether the check is done before or after preglob(), it 
would never skip expmeta() when it shouldn't. If it's kept before 
preglob(), then it will correctly skip expmeta() in more cases than if 
it's moved after it, so I don't think there's any benefit in moving it.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 17:14, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 02:57:12PM +0200, Harald van Dijk wrote:


That's a good point and wouldn't have too much of an impact on performance
of existing scripts. BTW, that means both expandmeta()'s metachars variable,
and expmeta()'s

   if (metaflag == 0 || lstat64(expdir, ) >= 0)

optimisation. You already got rid of the latter in your patch to prevent
buffer overflows.


No the purpose of this metaflag check is to bypass the lstat call.
My patch simply made it bypass the call by returning earlier.


Oh, right.


It is not relevant to whether we include backslash as a metacharacter.


I was thinking about not making backslashes set metaflag in expmeta(): 
when the pathname component doesn't include *, ?, or [, but does include 
backslashes, then the if (metaflag == 0) block could handle that as long 
as it performs the lstat64() check unconditionally. There's no need to 
go through the opendir()/readdir()/closedir() path for that case. Since 
expmeta() is bypassed for words not containing any potentially-magic 
characters, the impact might be small enough.


Regardless of whether metaflag is set, it would mean things like 'set 
"["' would start hitting the FS unnecessarily, if I understand 
correctly: the preglob()-expanded pattern is '\[', and expmeta() can no 
longer tell this apart from $v where v='\[' where a FS check would be 
required.


Perhaps preglob() should just be avoided, and expmeta() taught to 
respect both '\\' and CTLESC. '\\' would be a metacharacter and require 
a FS hit, CTLESC wouldn't require it. This would also avoid some memory 
allocations.



In regular /d\ev that doesn't come from a variable, the backslash will have
been replaced by CTLESC, but it's not necessary to treat CTLESC as a
metacharacter: in that case, either there's a match and pathname expansion
produces /dev, or there's no match and quote removal produces /dev, so the
optimisation is still valid. It'd only affect backslashes that come from a
variable, so it's very limited in scope.


For the case where the backslash came from the parser, the CTLESC
would have already been removed by preglob so there should be no
difference.


if (!strpbrk(str->text, metachars))
goto nometa;

in expandmeta() is done before preglob() is called. Here, it is still 
not necessary to include CTLESC.



The only problem with the metacharacter approach is that glibc's
glob(3) probably won't treat it as a metacharacter and therefore
it will still behave in the same way as the current bash.


This doesn't matter much yet, but yeah, it will if you decide to enable 
--enable-glob by default in the future. As long as you don't include 
GLOB_NOESCAPE, then normally, it should be taken as escaping the next 
character, but it might still trigger GLOB_NOMAGIC to return early. If 
it does, I'd take that as a glibc bug, but that doesn't help dash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 14:19, Herbert Xu wrote:

Nowhere does it say that backslashes that come from the result of
parameter expansion is always literal.

So it's clear that any shell that treats the backslash as a literal
backslash is already broken.


By the literal POSIX wording, yes, but the fact remains that that's what 
all shells aside from bash were already doing, so that may mean it's a 
POSIX defect. That's why I qualified with "POSIX specifies and/or 
intends to specify".


But I'm tempted to agree with your and Martijn's point of view now that 
it's better to just implement it the way POSIX specifies, that it's more 
useful that way.



On Tue, Mar 27, 2018 at 01:07:21PM +0200, Harald van Dijk wrote:

Can you show me any shell other than bash that lets this optimisation affect
the results?


The fact is that the other shells are not doing what they do because
they're not doing this optimisation.  They do what they do because
they have broken POSIX because they always treat backslashes that
arise from parameter expansion as literal backslashes.


Right, but they're doing this optimisation as well.


FWIW it wouldn't be hard to iron out the anomalous case of /d\ev.
You just have treat the backslash as a metacharacter.


That's a good point and wouldn't have too much of an impact on 
performance of existing scripts. BTW, that means both expandmeta()'s 
metachars variable, and expmeta()'s


  if (metaflag == 0 || lstat64(expdir, ) >= 0)

optimisation. You already got rid of the latter in your patch to prevent 
buffer overflows.


In regular /d\ev that doesn't come from a variable, the backslash will 
have been replaced by CTLESC, but it's not necessary to treat CTLESC as 
a metacharacter: in that case, either there's a match and pathname 
expansion produces /dev, or there's no match and quote removal produces 
/dev, so the optimisation is still valid. It'd only affect backslashes 
that come from a variable, so it's very limited in scope.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Harald van Dijk

On 27/03/2018 11:44, Herbert Xu wrote:

On Tue, Mar 27, 2018 at 11:16:29AM +0200, Harald van Dijk wrote:


If you say that quote removal takes place on the original token (meaning
before parameter expansion), and if parameter expansion takes place before
pathname expansion, then there's nothing left to allow \* to behave
differently from *.


Either you misunderstood me or you misread POSIX.


I misunderstood you. Given v='\' and the word \\$v, I took the result of 
parameter expansion as \\\ where the first two backslashes come from the 
original word, you take the result of parameter expansion as only \. 
Because of that, when you wrote quote removal is never applied to the 
results of parameter expansion, I didn't see what you meant. I think 
you're right in your terminology, the result of parameter expansion is 
just \, to get \\\ it would need to be phrased as something like "the 
result of applying parameter expansion". Thanks for the clarification.



POSIX never actually says this optimisation is allowed. The only thing that
allows it is the fact that it produces the same results anyway. If that
stops, then checking the file system for matches becomes required.


It doesn't disallow it either.


If POSIX specifies a result, and a shell applies an optimisation that 
causes a different result to be produced, doesn't that inherently 
disallow that optimisation? There may be some confusion and/or 
disagreement over what exactly POSIX specifies and/or intends to 
specify, but I don't see how you can argue that POSIX specifies result 
A, but it's okay to apply an optimisation that produces result B.



Can you show me a shell that doesn't apply this optimisation?


Can you show me any shell other than bash that lets this optimisation 
affect the results?


Like I wrote in my initial message, all other shells I tested take a 
backslash that comes from a variable after parameter expansion as 
effectively quoted. Given your b="/*/\null", bash is the only shell I 
know that expands $b to /dev/null. All others do perform pathname 
expansion, but check to see if a file \null exists in any top-level 
directory.


Given a="/de[v]/\null" and b="/dev/\null", all shells other than bash, 
even dash, agree that $a and $b expand to '/dev/\null' if that file 
exists, or the original string if that file doesn't exist. *That* is 
what allows the optimisation in the case of $b: expanding to either 
'/dev/\null' or '/dev/\null', depending on some condition, can be done 
without evaluating that condition.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Harald van Dijk

On 26/03/2018 11:34, Martijn Dekker wrote:

Op 25-03-18 om 22:56 schreef Harald van Dijk:

   case /dev in $pat) echo why ;; esac

Now, bash and dash say that the pattern does match -- they take the
backslash as unquoted, allowing it to escape the v. Most other shells
(bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as
quoted.

This doesn't make sense to me, and doesn't match historic practice:

[...]


With the snipping it's not clear that I was specifically confused by the 
inconsistency.


I had included another example:

  pat="/de\v"
  printf "%s\n" $pat

I can understand treating backslash as quoted, or treating it as 
unquoted, but not quoted-unless-in-a-case-statement. What justifies this 
one exception?


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix buffer overflow in expandmeta

2018-03-26 Thread Harald van Dijk

On 3/26/18 7:12 AM, Herbert Xu wrote:

FWIW your patch when built with -O2 with gcc 4.7.2 is actually
16 bytes bigger than my version.


Interesting. Not sure what might cause that.


@@ -1344,6 +1335,8 @@ expmeta(char *enddir, char *name)
metaflag++;
p = name;
do {
+   if (enddir == expdir + PATH_MAX)
+   return;
if (*p == '\\')
p++;
*enddir++ = *p;


Also there is another loop in between these two hunks that also
needs to check enddir.


Indeed, thanks.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix buffer overflow in expandmeta

2018-03-25 Thread Harald van Dijk

On 3/26/18 4:54 AM, Herbert Xu wrote:

On Mon, Mar 26, 2018 at 03:03:38AM +0200, Harald van Dijk wrote:


This was already the case before your patch, but on systems that flat out
reject paths longer than PATH_MAX (that includes GNU/Linux), it seems weird
that expansion can produce paths which then don't work.


PATH_MAX only applies in certain cases.  Besides, you can always
cd into the directories one level at a time to circumvent it.


In other words, when you change the path to a different one than what 
globbing produced. That was kind of my point. :)



Besides, even if we did impose a limit here
we'd still have to check that limit at every recursion level which
means that the code won't be that much simpler.


Proof of concept attached. The resulting code is about 100 bytes smaller 
at -Os than the dynamic reallocation approach. It's possible to further 
reduce that with a statically allocated buffer, but of course that means 
a constant memory cost at run time.


Cheers,
Harald van Dijk
--- a/src/expand.c
+++ b/src/expand.c
@@ -122,7 +122,7 @@ STATIC void expandmeta(struct strlist *, int);
 #ifdef HAVE_GLOB
 STATIC void addglob(const glob_t *);
 #else
-STATIC void expmeta(char *, char *);
+STATIC void expmeta(char *);
 STATIC struct strlist *expsort(struct strlist *);
 STATIC struct strlist *msort(struct strlist *, int);
 #endif
@@ -1237,9 +1237,6 @@ addglob(pglob)
 
 
 #else	/* HAVE_GLOB */
-STATIC char *expdir;
-
-
 STATIC void
 expandmeta(struct strlist *str, int flag)
 {
@@ -1261,13 +1258,7 @@ expandmeta(struct strlist *str, int flag)
 
 		INTOFF;
 		p = preglob(str->text, RMESCAPE_ALLOC | RMESCAPE_HEAP);
-		{
-			int i = strlen(str->text);
-			expdir = ckmalloc(i < 2048 ? 2048 : i); /* XXX */
-		}
-
-		expmeta(expdir, p);
-		ckfree(expdir);
+		expmeta(p);
 		if (p != str->text)
 			ckfree(p);
 		INTON;
@@ -1296,7 +1287,7 @@ nometa:
  */
 
 STATIC void
-expmeta(char *enddir, char *name)
+expmeta1(char *expdir, char *enddir, char *name)
 {
 	char *p;
 	const char *cp;
@@ -1344,6 +1335,8 @@ expmeta(char *enddir, char *name)
 			metaflag++;
 		p = name;
 		do {
+			if (enddir == expdir + PATH_MAX)
+return;
 			if (*p == '\\')
 p++;
 			*enddir++ = *p;
@@ -1390,22 +1383,34 @@ expmeta(char *enddir, char *name)
 		if (dp->d_name[0] == '.' && ! matchdot)
 			continue;
 		if (pmatch(start, dp->d_name)) {
+			p = enddir;
+			cp = dp->d_name;
+			for (;;) {
+if (p == expdir + PATH_MAX)
+	goto toolong;
+if ((*p++ = *cp++) == '\0')
+	break;
+			}
 			if (atend) {
-scopy(dp->d_name, enddir);
 addfname(expdir);
 			} else {
-for (p = enddir, cp = dp->d_name;
- (*p++ = *cp++) != '\0';)
-	continue;
 p[-1] = '/';
-expmeta(p, endname);
+expmeta1(expdir, p, endname);
 			}
 		}
+toolong: ;
 	}
 	closedir(dirp);
 	if (! atend)
 		endname[-esc - 1] = esc ? '\\' : '/';
 }
+
+STATIC void
+expmeta(char *name)
+{
+	char expdir[PATH_MAX];
+	expmeta1(expdir, expdir, name);
+}
 #endif	/* HAVE_GLOB */
 
 


Re: expand: Fix buffer overflow in expandmeta

2018-03-25 Thread Harald van Dijk

On 3/25/18 10:38 AM, Herbert Xu wrote:

The native version of expandmeta allocates a buffer that may be
overrun for two reasons.  First of all the size is 1 byte too small
but this is normally hidden because the minimum size is rounded
up to 2048 bytes.  Secondly, if the directory level is deep enough,
any buffer can be overrun.

This patch fixes both problems by calling realloc when necessary.


This was already the case before your patch, but on systems that flat 
out reject paths longer than PATH_MAX (that includes GNU/Linux), it 
seems weird that expansion can produce paths which then don't work.


Test case:

  cd /tmp
  x=x
  x=$x$x$x$x$x$x$x$x$x$x$x$x$x$x$x
  ln -s . $x
  set -- $x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x/$x*
  case $1 in
  *"*") echo "no match" ;;
  *)if test -e "$1"
then echo "match and exists"
else echo "match but does not exist"
fi ;;
  esac
  rm $x

I'm seeing "no match" from bash/mksh/pdksh/posh/yash/zsh, but "match but 
does not exist" from dash/bosh/ksh93. Other commands would naturally 
print the "File name too long" error.


Should the buffer perhaps simply be limited to PATH_MAX, taking care to 
just give up if something longer is found? That could avoid the need to 
handle reallocations and result in somewhat smaller and simpler code.


The downside would be that if other systems do support paths longer than 
PATH_MAX, globbing would no longer work for those paths.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix glibc glob(3) support

2018-03-25 Thread Harald van Dijk

On 3/25/18 10:06 AM, Herbert Xu wrote:

It's been a while since we disabled glob(3) support by default.
It appears to be working now, however, we have to change our
code to remove a workaround that now actually prevents it from
working.

In particular, we no longer use GLOB_NOMAGIC or test GLOB_MAGCHAR.


I thought that was an optimisation to avoid unnecessary file system 
access, to ensure that simple commands such as "echo ok" don't bother 
looking for files named "echo" and "ok" in the current directory.


It indeed doesn't work, but I took that to be a glibc bug.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Feature request: UTF-8 support

2018-03-24 Thread Harald van Dijk

On 3/22/18 8:12 PM, Harald van Dijk wrote:
I have a local patch set of stuff that I haven't considered in an 
appropriate state to send to the list. One of the features I've got 
working is multibyte locale support. It doesn't assume UTF-8 (only 
assumes stateless encodings) and is tolerant of malformed strings. I've 
got it working in pattern matching, in glob() sorting, in trimming (#, 
##, % and %%), in field splitting, and in "$*" joining. I'll see if I 
can polish it a bit and make it available when I've got some more free 
time again, but I'm not sure how soon that will be.


Actually, there's no real harm in just publishing what I have now.

  https://github.com/hvdijk/dash

This is my personal playground, so I'll have to include some disclaimers:

Commits may take a different approach than what dash is interested in.
Commits may be poorly tested.
Commits may contain bugs.
Commits may have non-obvious dependencies on earlier commits.
Commits may be poor style.
Commits may be re-written and force-pushed.

That said, feel free to see if the locale support does what you're 
after. If so, it should be possible to bring into a shape suitable for dash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix ghost fields with unquoted $@/$*

2018-03-23 Thread Harald van Dijk

On 23/03/2018 14:14, Herbert Xu wrote:

On Fri, Mar 23, 2018 at 01:52:10PM +0100, Harald van Dijk wrote:


It doesn't, and I didn't say it did. POSIX doesn't care how it's
implemented, POSIX cares about the results produced. When another approach
produces the same results, both approaches are equally correct.


Right, but restoring the old behaviour for white spaces only is
silly.  It's silly
to have this work:

set -- A1 B2 C3
echo ${@%2 C3}

but not this:

set -- A1 B2 C3
IFS=:
echo ${@%2:C3}


Agreed. Both are doable in a POSIX-compatible way, but the second would 
require a different approach: either scanning the strings to suppress 
the addition of the separator character in those cases where its 
presence would cause splitting to produce an additional field, or big 
changes in how regions are recorded and split. Either way, that's a 
large cost for a feature that's likely seen very little real-world use.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix ghost fields with unquoted $@/$*

2018-03-23 Thread Harald van Dijk

On 23/03/2018 13:20, Herbert Xu wrote:

On Fri, Mar 23, 2018 at 12:19:28PM +0100, Harald van Dijk wrote:



Right.  I think I'll leave this one alone.  It worked purely by
chance previously because dash incorrectly used the first IFS
character even when it's outside of double-quotes, which is why
the expansion ${@%2 C3} matches.


When the first IFS character is a whitespace character, which it usually is,
it's not incorrect: inserting that character and then splitting on it
produces the exact results that POSIX requires. It's only when IFS doesn't
start with a whitespace character that it's incorrect.


Where does POSIX require this?


It doesn't, and I didn't say it did. POSIX doesn't care how it's 
implemented, POSIX cares about the results produced. When another 
approach produces the same results, both approaches are equally correct.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix ghost fields with unquoted $@/$*

2018-03-23 Thread Harald van Dijk

On 23/03/2018 10:10, Herbert Xu wrote:

On Fri, Mar 23, 2018 at 09:27:37AM +0100, Harald van Dijk wrote:


Also the above. But it breaks a traditional ash extension:

   unset IFS
   set -- A1 B2 C3
   echo ${@%2 C3}

This used to print A1 B, but prints A1 B2 C3 with your patch.

   echo ${@%2}

This used to print A1 B2 C3, but prints A1 B with your patch.


Hmm, it still does on my machine:


Apologies, I ended up sending the wrong test case. It's when IFS has its 
default value that the behaviour is changed. It's when it's unset that 
it still works.


Initial testing was against 0.5.9.1 plus your patch, now retested 
against c166b71 plus your patch.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Fix ghost fields with unquoted $@/$*

2018-03-23 Thread Harald van Dijk

On 23/03/2018 05:37, Herbert Xu wrote:

You're right.  The proper fix to this is to ensure that nulonly
is not set in varvalue for $*.  It should only be set for $@ when
it's inside double quotes.

In fact there is another bug while we're playing with $@/$*.
When IFS is set to a non-whitespace character such as :, $*
outside quotes won't remove empty fields as it should.


That's not limited to empty fields:

  IFS=,
  set -- , ,
  set -- $@
  echo $#

This should print 2, not 3. (bash gets this wrong too.)


This patch fixes both problems.


Also the above. But it breaks a traditional ash extension:

  unset IFS
  set -- A1 B2 C3
  echo ${@%2 C3}

This used to print A1 B, but prints A1 B2 C3 with your patch.

  echo ${@%2}

This used to print A1 B2 C3, but prints A1 B with your patch.

This extension was already pretty much broken within quoted expansions: 
"${@%2}" would already expand to A1 B. Your patch extends that to the 
non-quoted case.


I do not know if you want to keep this extension in.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't record empty IFS scan regions

2018-03-22 Thread Harald van Dijk

On 22/03/2018 22:38, Martijn Dekker wrote:

Op 22-03-18 om 20:28 schreef Harald van Dijk:

On 22/03/2018 03:40, Martijn Dekker wrote:

This patch fixes the bug that, given no positional parameters, unquoted
$@ and $* incorrectly generate one empty field (they should generate no
fields). Apparently that was a side effect of the above.


This seems weird though. If you want to remove the recording of empty
regions because they are pointless, then how does removing them fix a
bug? Doesn't this show that empty regions do have an effect? Perhaps
they're not supposed to have any effect, perhaps it's a specific
combination of empty regions and something else that triggers some bug,
and perhaps that combination can no longer occur with your patch.


The latter is my guess, but I haven't had time to investigate it.


Looking into it again:

When IFS is set to an empty string, sepc is set to '\0' in varvalue(). 
This then causes *quotedp to be set to true, meaning evalvar()'s quoted 
variable is turned on. quoted is then passed to recordregion() as the 
nulonly parameter.


ifsp->nulonly has a bigger effect than merely selecting whether to use 
$IFS or whether to only split on null bytes: in ifsbreakup(), nulonly 
also causes string termination to be suppressed. That's correct: that 
special treatment is required to preserve empty fields in "$@" 
expansion. But it should *only* be used when $@ is quoted: ifsbreakup() 
takes nulonly from the last IFS region, even if it's empty, so having an 
additional zero-length region with nulonly enabled causes confusion.


Passing quoted by value to varvalue() and not attempting to modify it 
should therefore, and in my quick testing does, also work to fix the 
original $@ bug.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Feature request: UTF-8 support

2018-03-22 Thread Harald van Dijk

On 22/03/2018 22:01, Martijn Dekker wrote:

Op 22-03-18 om 20:12 schreef Harald van Dijk:

Isn't all of busybox, including ash, GPL? Wouldn't that mean that if any
busybox code is imported into dash, that from then on dash can only be
distributed under GPL?


I thought dash was already effectively under the GPL due to including
the output of mksignames.c.


To be honest, I'm getting confused now.

I knew that the output of GPL programs is not generally covered by the 
GPL itself. And that's all that COPYING says:


 "mksignames.c:

  This file is not directly linked with dash.  However, its output is."

But about the not "generally" covered: as mentioned on 
<https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#WhatCaseIsOutputGPL>, 
if the output includes part of the GPL-licensed code, then the output is 
GPL-licensed as well. And admittedly, signames.c does contain part of 
mksignames.c.


It seems unclear to packagers, at any rate:

Fedora and Gentoo say dash is BSD-licensed. MacPorts lists it as GPL-2+.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't record empty IFS scan regions

2018-03-22 Thread Harald van Dijk

On 22/03/2018 03:40, Martijn Dekker wrote:

evalvar() records empty expansion results (varlen == 0) as string
regions that need to be scanned for IFS characters. This is pointless,
because there is nothing to split.


varlen may be set to -1 when an unset variable is expanded. If it's 
beneficial to avoid recording empty regions to scan for IFS characters, 
should those also be excluded?



This patch fixes the bug that, given no positional parameters, unquoted
$@ and $* incorrectly generate one empty field (they should generate no
fields). Apparently that was a side effect of the above.


This seems weird though. If you want to remove the recording of empty 
regions because they are pointless, then how does removing them fix a 
bug? Doesn't this show that empty regions do have an effect? Perhaps 
they're not supposed to have any effect, perhaps it's a specific 
combination of empty regions and something else that triggers some bug, 
and perhaps that combination can no longer occur with your patch.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC/Feature request: UTF-8 support

2018-03-22 Thread Harald van Dijk

On 22/03/2018 16:36, Martijn Dekker wrote:

Busybox ash [...] The licenses are bidirectionally compatible.


Isn't all of busybox, including ash, GPL? Wouldn't that mean that if any 
busybox code is imported into dash, that from then on dash can only be 
distributed under GPL?


I have a local patch set of stuff that I haven't considered in an 
appropriate state to send to the list. One of the features I've got 
working is multibyte locale support. It doesn't assume UTF-8 (only 
assumes stateless encodings) and is tolerant of malformed strings. I've 
got it working in pattern matching, in glob() sorting, in trimming (#, 
##, % and %%), in field splitting, and in "$*" joining. I'll see if I 
can polish it a bit and make it available when I've got some more free 
time again, but I'm not sure how soon that will be.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Expansion-lookalikes in heredoc delimiters

2018-03-15 Thread Harald van Dijk

On 15/03/2018 18:11, Herbert Xu wrote:

On Thu, Mar 15, 2018 at 05:29:27PM +0100, Harald van Dijk wrote:


That's because POSIX specifies that after ${, everything up to the matching
}, not including nested strings, expansions, etc., is part of the word. No
exception is made when it spans multiple lines.

Another instance of this is

   if false; then
 echo ${
   fi
   }
   fi
   echo ok

This is accepted by bash, ksh, zsh and posh.


Interestingly, ksh bombs out on this:

echo ${
fi
}

So this behaviour is not exactly consistent.


It's perfectly consistent. It gets accepted at parse time, it only gets 
rejected at expansion time. That's how dash generally behaves as well:


  $ dash -c 'echo ${x^}'
  dash: 1: Bad substitution
  $ dash -c ': || echo ${x^}'
  $

Historically, as I understand it, ash would reject both of these, but 
you specifically modified dash to accept invalid expansions during 
parsing (which makes sense):



<https://git.kernel.org/pub/scm/utils/dash/dash.git/commit/?id=3df3edd13389ae768010bfacee5612346b413e38>


In any case, this substituion is invalid in all of these shells so
does it really matter?


Okay, it can be trivially modified to something that does work in other 
shells (even if it were actually executed), but gets rejected at parse 
time by dash:


  if false; then
: ${$+
  }
  fi

It'd be nice if all shells used the same parse rules, so that scripts 
can detect dynamically which expansions are supported, but don't have to 
go through ugly eval commands to use them.



That's because bash doesn't ever match multi-line heredoc delimiters. Which
is what POSIX technically requires:

   "The here-document shall be treated as a single word that begins after the
next  and continues until there is a line containing only the
delimiter and a , with no  characters in between."

No single line will ever match a multi-line heredoc delimiter.


Sure.  But this is a quality of implementation issue.  If you're
never going to match the delimter you should probably emit an error
at the very start rather than at the EOF.


POSIX isn't clear on whether reaching EOF without seeing the heredoc 
delimiter is an error and shells disagree. dash silently accepts it, as 
do ksh, yash and zsh.


When EOF is an acceptable way to terminate a heredoc, a delimiter which 
never matches can be useful to force the complete remainder of the file 
to be treated as a heredoc.



Another interesting thing to try is

cat << ${ foo
${

Also you can look at the quotes:

cat << "
"

IOW it's not clear that "word" in this context necessarily follows
the same rules used during normal tokenisation.


Quotes are clear: as far as they go, this *must* follow the same rules used
during normal tokenisation. Does any shell disagree?


I was talking about multi-line quotes, which you conveniently dismissed :)


I got back to that :) I picked another example first because I thought 
it would be clearer using that that the normal tokenisation rules should 
be used.



For your example, does any shell, including dash, *not* treat this as a
multi-line heredoc delimiter consisting of a newline character (which may or
may not be matched by two blank lines, depending on the shell)?


Yes, almost every shell fails with the multi-line delimiter inside
double quotes.  Only dash and ksh93 seem to get it right.


posh accepts multi-line heredoc delimiters as well.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Expansion-lookalikes in heredoc delimiters

2018-03-15 Thread Harald van Dijk

On 15/03/2018 15:52, Herbert Xu wrote:

On Thu, Mar 15, 2018 at 12:41:10PM +0100, Harald van Dijk wrote:


It is if you want to do it the way POSIX specifies. You're adding a special
exception in the parser. I don't see how this approach can be extended to
handle the other examples in my mail:


I don't think it's exactly clear what POSIX says on this.


I don't think it's clear what POSIX means to say, but I do think it's 
clear what it does say.



   cat <<`one two`
   ok
   `one two`


Try this one:

cat << ${
${

bash/ksh93 both will get stuck forever until a right brace is
entered.


That's because POSIX specifies that after ${, everything up to the 
matching }, not including nested strings, expansions, etc., is part of 
the word. No exception is made when it spans multiple lines.


Another instance of this is

  if false; then
echo ${
  fi
  }
  fi
  echo ok

This is accepted by bash, ksh, zsh and posh.

However, in this instance, I'm having trouble finding where, but IIRC, 
POSIX says or means to say that it's okay to flag this as an invalid 
parameter expansion at parse time even though the evaluation would 
otherwise be skipped.



While other shells all behave the way dash does.


All? At least two others don't: yash rejects ${ as invalid because of 
the missing parameter name. zsh agrees with bash and ksh that it should 
look for the closing }.



Considering the fact that even if you closed the brace after
the newline bash would still be stuck forever,


That's because bash doesn't ever match multi-line heredoc delimiters. 
Which is what POSIX technically requires:


  "The here-document shall be treated as a single word that begins 
after the next  and continues until there is a line containing 
only the delimiter and a , with no  characters in between."


No single line will ever match a multi-line heredoc delimiter.


I think this
behaviour is suboptimal.  ksh93 seems to do the right thing though.


Yes, I agree that multi-line heredoc delimiters are useful.


Another interesting thing to try is

cat << ${ foo
${

Also you can look at the quotes:

cat << "
"

IOW it's not clear that "word" in this context necessarily follows
the same rules used during normal tokenisation.


Quotes are clear: as far as they go, this *must* follow the same rules 
used during normal tokenisation. Does any shell disagree?


  cat << "A ' B"
  ok 1
  A ' B
  echo ok 2

I expect this to print

  ok 1
  ok 2

and I'm not finding any shell that disagrees, not even dash. Do you have 
an example of a shell that sees only "A as the heredoc delimiter, or one 
that perhaps performs quote removal on the inner '?


For your example, does any shell, including dash, *not* treat this as a 
multi-line heredoc delimiter consisting of a newline character (which 
may or may not be matched by two blank lines, depending on the shell)?



Anyway, dash has had CHKEOFMARK since 2007 so this is just an
extension of existing behaviour.


This is definitely true: your patch, regardless of whether it matches 
POSIX's requirements, is consistent with how dash has behaved for a long 
time.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-09 Thread Harald van Dijk

On 3/8/18 1:40 AM, Harald van Dijk wrote:
If the syntax stack is to be stored on the actual stack, then real 
recursion could be used instead, as attached.


Even though it won't be accepted in dash, I continued with this approach 
for my own use. I've now got it to about 1800 bytes smaller (at -Os -s).


After the other changes I'd done, it became apparent to me that the 
syntax tables were unnecessary, and that they'd become fairly easy to 
get rid of. This was a big space saver that may be possible to apply to 
your version as well.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..acd5fdf 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -83,7 +83,7 @@
 #define RMESCAPE_HEAP	0x10	/* Malloc strings instead of stalloc */
 
 /* Add CTLESC when necessary. */
-#define QUOTES_ESC	(EXP_FULL | EXP_CASE | EXP_QPAT)
+#define QUOTES_ESC	(EXP_FULL | EXP_CASE)
 /* Do not skip NUL characters. */
 #define QUOTES_KEEPNUL	EXP_TILDE
 
@@ -115,8 +115,8 @@ STATIC char *exptilde(char *, char *, int);
 STATIC void expbackq(union node *, int);
 STATIC const char *subevalvar(char *, char *, int, int, int, int, int);
 STATIC char *evalvar(char *, int);
-STATIC size_t strtodest(const char *, const char *, int);
-STATIC void memtodest(const char *, size_t, const char *, int);
+STATIC size_t strtodest(const char *, int);
+STATIC void memtodest(const char *, size_t, int);
 STATIC ssize_t varvalue(char *, int, int, int *);
 STATIC void expandmeta(struct strlist *, int);
 #ifdef HAVE_GLOB
@@ -333,16 +333,6 @@ addquote:
 		case CTLESC:
 			startloc++;
 			length++;
-
-			/*
-			 * Quoted parameter expansion pattern: remove quote
-			 * unless inside inner quotes or we have a literal
-			 * backslash.
-			 */
-			if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-			EXP_QPAT && *p != '\\')
-break;
-
 			goto addquote;
 		case CTLVAR:
 			p = evalvar(p, flag | inquotes);
@@ -396,7 +386,7 @@ done:
 	if (!home || !*home)
 		goto lose;
 	*p = c;
-	strtodest(home, SQSYNTAX, quotes);
+	strtodest(home, quotes | EXP_QUOTED);
 	return (p);
 lose:
 	*p = c;
@@ -521,7 +511,6 @@ expbackq(union node *cmd, int flag)
 	char *p;
 	char *dest;
 	int startloc;
-	char const *syntax = flag & EXP_QUOTED ? DQSYNTAX : BASESYNTAX;
 	struct stackmark smark;
 
 	INTOFF;
@@ -535,7 +524,7 @@ expbackq(union node *cmd, int flag)
 	if (i == 0)
 		goto read;
 	for (;;) {
-		memtodest(p, i, syntax, flag & QUOTES_ESC);
+		memtodest(p, i, flag & (QUOTES_ESC | EXP_QUOTED));
 read:
 		if (in.fd < 0)
 			break;
@@ -651,8 +640,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla
 	char *(*scan)(char *, char *, char *, char *, int , int);
 
 	argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ?
-			   (flag & (EXP_QUOTED | EXP_QPAT) ?
-			EXP_QPAT : EXP_CASE) : 0));
+			   EXP_CASE : 0));
 	STPUTC('\0', expdest);
 	argbackq = saveargbackq;
 	startp = stackblock() + startloc;
@@ -844,7 +832,7 @@ end:
  */
 
 STATIC void
-memtodest(const char *p, size_t len, const char *syntax, int quotes) {
+memtodest(const char *p, size_t len, int quotes) {
 	char *q;
 
 	if (unlikely(!len))
@@ -855,11 +843,17 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) {
 	do {
 		int c = (signed char)*p++;
 		if (c) {
-			if ((quotes & QUOTES_ESC) &&
-			((syntax[c] == CCTL) ||
-			 (((quotes & EXP_FULL) || syntax != BASESYNTAX) &&
-			  syntax[c] == CBACK)))
-USTPUTC(CTLESC, q);
+			if (quotes & QUOTES_ESC) {
+switch (c) {
+	case '\\':
+	case '!': case '*': case '?': case '[': case '=':
+	case '~': case ':': case '/': case '-': case ']':
+		if (quotes & EXP_QUOTED)
+	case CTLVARS:
+			USTPUTC(CTLESC, q);
+		break;
+}
+			}
 		} else if (!(quotes & QUOTES_KEEPNUL))
 			continue;
 		USTPUTC(c, q);
@@ -870,13 +864,10 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) {
 
 
 STATIC size_t
-strtodest(p, syntax, quotes)
-	const char *p;
-	const char *syntax;
-	int quotes;
+strtodest(const char *p, int quotes)
 {
 	size_t len = strlen(p);
-	memtodest(p, len, syntax, quotes);
+	memtodest(p, len, quotes);
 	return len;
 }
 
@@ -895,15 +886,13 @@ varvalue(char *name, int varflags, int flags, int *quotedp)
 	int sep;
 	char sepc;
 	char **ap;
-	char const *syntax;
 	int quoted = *quotedp;
 	int subtype = varflags & VSTYPE;
 	int discard = subtype == VSPLUS || subtype == VSLENGTH;
-	int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL;
+	int quotes = quoted | (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL;
 	ssize_t len = 0;
 
 	sep = (flags & EXP_FULL) << CHAR_BIT;
-	syntax = quoted ? DQSYNTAX : BASESYNTAX;
 
 	switch (*name) {
 	case '$':
@@ -946,11 +935,11 @@ param:
 		if (!(ap = shellparam.p))
 			return -1;
 		while ((p = *ap++)) {
-			len += strtodest(p, syntax, quotes);
+			le

Re: [PATCH] parser: Fix single-quoted patterns in here-documents

2018-03-09 Thread Harald van Dijk

On 3/9/18 4:07 PM, Herbert Xu wrote:

On Thu, Mar 08, 2018 at 07:35:53PM +0100, Harald van Dijk wrote:


Related:

   x=*; cat <

I don't think this is related to our patches at all.


Not related to our patches, but related to the original bug. It's 
another instance where quoted * is wrongly treated as unquoted.


Your fix looks good to me.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Expansion-lookalikes in heredoc delimiters

2018-03-08 Thread Harald van Dijk

Consider this:

  cat <<`bad`
  `bad`

As far as I can tell, this is technically valid, supposed to print 
nothing, and accepted in most other shells.


POSIX Token Recognition says `bad` is to be recognised as a single token 
of type word, and any word can act as a heredoc delimiter. POSIX 
Here-Document says only quote removal is performed on the word.


However, dash does not always preserve the original spelling of the 
word. That's what's going on here. Because dash hasn't preserved the 
`bad` spelling (it's been turned into CTLBACKQ), the check for the end 
of the heredoc doesn't pick it up, and it instead gets taken as a 
command substitution.


When an actual 0x84 byte is seen in the input, *that* gets taken as the 
heredoc delimiter:


  dash -c "tail -n1 <<\`:\`
  `printf \\\204`
  ok
  \`:\`"

In a locale in which 0x84 is a valid character (since dash doesn't 
support locales, that's easy, it's always valid), it's supposed to print 
"ok". dash instead interprets the second line as the end of the heredoc 
and subsequently issues an error message when it interprets "ok" on line 
3 as a command to execute.


This is pretty clearly a case that no serious script is ever going to 
encounter, not to mention one that many shells don't even attempt to 
support, at least not completely, so I don't think this is a real 
problem. I'm mentioning it anyway because I was trying to come up with a 
few more test cases for the parser, and I think it's good to have a 
record not only of what worked, what has been made to work, and what got 
broken, but also of what's never going to be work.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-08 Thread Harald van Dijk

On 08/03/2018 08:55, Herbert Xu wrote:

On Thu, Mar 08, 2018 at 01:40:32AM +0100, Harald van Dijk wrote:


If the syntax stack is to be stored on the actual stack, then real recursion
could be used instead, as attached. I tried to avoid recursion for the cases
that unpatched dash already handled properly.


It does look a lot simpler than I expected.  However, this patch
is still 30% bigger than my patch :)


That's a bit unfair: that's mostly due to moved code, not to changed code.

But size-wise, your patch still wins: parser.o does become larger with 
my patch than with yours, largely due to my attempts to only recurse 
when needed.



As real recursion doesn't seem to buy us much I think I'll stick
with the syntax stack for now.


@@ -941,20 +1042,25 @@ readtoken1(int firstc, char const *syntax, char 
*eofmark, int striptabs)
c != '\\' && c != '`' &&
c != '$' && (
c != '"' ||
-   eofmark != NULL
+   (eofmark != NULL && 
!varnest)
+   ) && (
+   c != '}' ||
+   !varnest ||
+   (dqvarnest ? innerdq : 
dblquote)


So this seems to be the only substantial difference between your
patch and mine.  I have looked at the behaviour of other shells
and I think I will stick with my patch's behaviour for now.


The first line of this part of my patch is about something else:

  x=\"; cat <
In general, if there are disagreements between shells and the
standard is not specific enough, I'll pick the approach that
results in simpler code.


Fair enough.

Cheers,
Harald van Dijk


Thanks,


--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2018-03-07 Thread Harald van Dijk

On 3/8/18 7:30 AM, Herbert Xu wrote:

Could you please resend these patches as two separate emails please?
Patchwork cannot handle two patches in one email:

https://patchwork.kernel.org/patch/10264661/


Ah, didn't realise that. I'll keep that in mind for future mails.

Actually, I'll withdraw my second patch for now, I spotted another 
problem in alias handling (already present in 0.5.9.1) and want to 
double-check that my patch didn't make things worse.



Also it would be nice if you could include the patch descriptions
in each email as these will go into the git tree.


parser: use pgetc_eatbnl() in more places.

dash has a pgetc_eatbnl function in parser.c which skips any 
backslash-newline combinations. It's not used everywhere it could be. 
There is also some duplicated backslash-newline handling elsewhere in 
parser.c. Replace most of the calls to pgetc() with calls to 
pgetc_eatbnl() and remove the duplicated backslash-newline handling.
diff --git a/src/parser.c b/src/parser.c
index 382658e..8b945e3 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -106,6 +106,7 @@ STATIC void parseheredoc(void);
 STATIC int peektoken(void);
 STATIC int readtoken(void);
 STATIC int xxreadtoken(void);
+STATIC int pgetc_eatbnl();
 STATIC int readtoken1(int, char const *, char *, int);
 STATIC void synexpect(int) __attribute__((__noreturn__));
 STATIC void synerror(const char *) __attribute__((__noreturn__));
@@ -656,8 +657,10 @@ parseheredoc(void)
 		if (needprompt) {
 			setprompt(2);
 		}
-		readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
-here->eofmark, here->striptabs);
+		if (here->here->type == NHERE)
+			readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs);
+		else
+			readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs);
 		n = (union node *)stalloc(sizeof (struct narg));
 		n->narg.type = NARG;
 		n->narg.next = NULL;
@@ -782,7 +785,7 @@ xxreadtoken(void)
 		setprompt(2);
 	}
 	for (;;) {	/* until token or start of word found */
-		c = pgetc();
+		c = pgetc_eatbnl();
 		switch (c) {
 		case ' ': case '\t':
 		case PEOA:
@@ -791,30 +794,23 @@ xxreadtoken(void)
 			while ((c = pgetc()) != '\n' && c != PEOF);
 			pungetc();
 			continue;
-		case '\\':
-			if (pgetc() == '\n') {
-nlprompt();
-continue;
-			}
-			pungetc();
-			goto breakloop;
 		case '\n':
 			nlnoprompt();
 			RETURN(TNL);
 		case PEOF:
 			RETURN(TEOF);
 		case '&':
-			if (pgetc() == '&')
+			if (pgetc_eatbnl() == '&')
 RETURN(TAND);
 			pungetc();
 			RETURN(TBACKGND);
 		case '|':
-			if (pgetc() == '|')
+			if (pgetc_eatbnl() == '|')
 RETURN(TOR);
 			pungetc();
 			RETURN(TPIPE);
 		case ';':
-			if (pgetc() == ';')
+			if (pgetc_eatbnl() == ';')
 RETURN(TENDCASE);
 			pungetc();
 			RETURN(TSEMI);
@@ -822,11 +818,9 @@ xxreadtoken(void)
 			RETURN(TLP);
 		case ')':
 			RETURN(TRP);
-		default:
-			goto breakloop;
 		}
+		break;
 	}
-breakloop:
 	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
 #undef RETURN
 }
@@ -836,7 +830,7 @@ static int pgetc_eatbnl(void)
 	int c;
 
 	while ((c = pgetc()) == '\\') {
-		if (pgetc() != '\n') {
+		if (pgetc2() != '\n') {
 			pungetc();
 			break;
 		}
@@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			attyline();
 			if (syntax == BASESYNTAX)
 return readtoken();
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 			goto loop;
 		}
 #endif
@@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	goto endword;	/* exit outer loop */
 USTPUTC(c, out);
 nlprompt();
-c = pgetc();
+c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 goto loop;		/* continue outer loop */
 			case CWORD:
 USTPUTC(c, out);
@@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	USTPUTC(CTLESC, out);
 	USTPUTC('\\', out);
 	pungetc();
-} else if (c == '\n') {
-	nlprompt();
 } else {
 	if (
 		dblquote &&
@@ -997,7 +989,7 @@ quotemark:
 	USTPUTC(c, out);
 	--parenlevel;
 } else {
-	if (pgetc() == ')') {
+	if (pgetc_eatbnl() == ')') {
 		USTPUTC(CTLENDARI, out);
 		if (!--arinest)
 			syntax = prevsyntax;
@@ -1025,7 +1017,7 @@ quotemark:
 	USTPUTC(c, out);
 }
 			}
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 		}
 	}
 endword:
@@ -1132,7 +1124,7 @@ parseredir: {
 	np = (union node *)stalloc(sizeof (struct nfile));
 	if (c == '>') {
 		np->nfile.fd = 1;
-		c = pgetc();
+		c = pgetc_eatbnl();
 		if (c == '>')
 			np->type = NAPPEND;
 		else if (c == '|')
@@ -1145,7 +1137,7 @@ parseredir: {
 		}
 	} else {	/* c == '<' */
 		np->nfile.fd = 0;
-		switch (c = pgetc()) {
+		switch (c = pgetc_eatbnl()) {
 		case '<':
 			if (sizeof (struct nfile) != sizeof (struct nhere)) {
 np = (union node *)stalloc(sizeof (struct nhere));
@@ -1154,7 +1146,7 @@ parseredir: {
 			np->type = NHERE;
 	

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Harald van Dijk

On 3/7/18 8:04 PM, Harald van Dijk wrote:

On 3/7/18 5:29 PM, Herbert Xu wrote:

On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote:


Another pre-existing dash parsing bug that I encountered now is $(( ( 
$(( 1
)) ) )). This should expand to 1, but gives a hard error in dash, 
again due
to the non-recursive nature of dash's parser. A small generalisation 
of what
I've been doing so far could handle this, so it makes sense to me to 
try to

achieve this at the same time.


Thanks for the patches.  You have convinced that a stacked syntax
is indeed necessary.  However, I'd like to do the reversion of the
run-time quote book-keeping patch in a separate step.

I also didn't quite like the idea of scanning the string backwards
to find the previous syntax.  So here is my attempt at the recursive
parsing using alloca.


If the syntax stack is to be stored on the actual stack, then real 
recursion could be used instead, as attached. I tried to avoid recursion 
for the cases that unpatched dash already handled properly.



It's not completely recursive in that I've
maintained the existing behaviour of double-quotes inside parameter
expansion inside double-quotes.  However, it does seem to address
most of the issues that you have raised.


One thing I found it doesn't handle, although it does look like you try 
to support it, is ${$-"}"}, which should expand to the same thing as $$. 
This doesn't work because the first " is CDQUOTE, not CENDQUOTE, so 
synstack->innerdq doesn't get set, and there's nothing else that 
prevents the quoted } from being treated as the end of the variable 
substitution.


To handle that it can just look at dblquote instead, included in the 
attached.


In this patch, I managed let "${$+\}}" expand to }, but "${$+"\}"}" to 
\}, for the reasons I gave earlier.


Martijn Dekker's test case of 'x=0; x=$((${x}+1))' also works with this.

Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..903e250 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -83,7 +83,7 @@
 #define RMESCAPE_HEAP	0x10	/* Malloc strings instead of stalloc */
 
 /* Add CTLESC when necessary. */
-#define QUOTES_ESC	(EXP_FULL | EXP_CASE | EXP_QPAT)
+#define QUOTES_ESC	(EXP_FULL | EXP_CASE)
 /* Do not skip NUL characters. */
 #define QUOTES_KEEPNUL	EXP_TILDE
 
@@ -333,16 +333,6 @@ addquote:
 		case CTLESC:
 			startloc++;
 			length++;
-
-			/*
-			 * Quoted parameter expansion pattern: remove quote
-			 * unless inside inner quotes or we have a literal
-			 * backslash.
-			 */
-			if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-			EXP_QPAT && *p != '\\')
-break;
-
 			goto addquote;
 		case CTLVAR:
 			p = evalvar(p, flag | inquotes);
@@ -651,8 +641,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla
 	char *(*scan)(char *, char *, char *, char *, int , int);
 
 	argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ?
-			   (flag & (EXP_QUOTED | EXP_QPAT) ?
-			EXP_QPAT : EXP_CASE) : 0));
+			   EXP_CASE : 0));
 	STPUTC('\0', expdest);
 	argbackq = saveargbackq;
 	startp = stackblock() + startloc;
@@ -1644,7 +1633,6 @@ char *
 _rmescapes(char *str, int flag)
 {
 	char *p, *q, *r;
-	unsigned inquotes;
 	int notescaped;
 	int globbing;
 
@@ -1674,24 +1662,23 @@ _rmescapes(char *str, int flag)
 			q = mempcpy(q, str, len);
 		}
 	}
-	inquotes = 0;
 	globbing = flag & RMESCAPE_GLOB;
 	notescaped = globbing;
 	while (*p) {
 		if (*p == (char)CTLQUOTEMARK) {
-			inquotes = ~inquotes;
 			p++;
 			notescaped = globbing;
 			continue;
 		}
+		if (*p == '\\') {
+			/* naked back slash */
+			notescaped = 0;
+			goto copy;
+		}
 		if (*p == (char)CTLESC) {
 			p++;
 			if (notescaped)
 *q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/expand.h b/src/expand.h
index 26dc5b4..90f5328 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -55,7 +55,6 @@ struct arglist {
 #define	EXP_VARTILDE	0x4	/* expand tildes in an assignment */
 #define	EXP_REDIR	0x8	/* file glob for a redirection (1 match only) */
 #define EXP_CASE	0x10	/* keeps quotes around for CASE pattern */
-#define EXP_QPAT	0x20	/* pattern in quoted parameter expansion */
 #define EXP_VARTILDE2	0x40	/* expand tildes after colons only */
 #define EXP_WORD	0x80	/* expand word in parameter expansion */
 #define EXP_QUOTED	0x100	/* expand word in double quotes */
diff --git a/src/parser.c b/src/parser.c
index 382658e..17d60b0 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -827,7 +827,8 @@ xxreadtoken(void)
 		}
 	}
 breakloop:
-	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
+	readtoken1(c, BASESYNTAX, (char *)NULL, 0);
+	return lasttoken;
 #undef RETURN
 }
 
@@ -855,47 +856,147 @@ static int pgetc_eatbnl(void)
  * word which marks the end of the doc

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Harald van Dijk

On 3/7/18 5:29 PM, Herbert Xu wrote:

On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote:


Another pre-existing dash parsing bug that I encountered now is $(( ( $(( 1
)) ) )). This should expand to 1, but gives a hard error in dash, again due
to the non-recursive nature of dash's parser. A small generalisation of what
I've been doing so far could handle this, so it makes sense to me to try to
achieve this at the same time.


Thanks for the patches.  You have convinced that a stacked syntax
is indeed necessary.  However, I'd like to do the reversion of the
run-time quote book-keeping patch in a separate step.

I also didn't quite like the idea of scanning the string backwards
to find the previous syntax.  So here is my attempt at the recursive
parsing using alloca.


Nice approach.


It's not completely recursive in that I've
maintained the existing behaviour of double-quotes inside parameter
expansion inside double-quotes.  However, it does seem to address
most of the issues that you have raised.


One thing I found it doesn't handle, although it does look like you try 
to support it, is ${$-"}"}, which should expand to the same thing as $$. 
This doesn't work because the first " is CDQUOTE, not CENDQUOTE, so 
synstack->innerdq doesn't get set, and there's nothing else that 
prevents the quoted } from being treated as the end of the variable 
substitution.



As I have not reverted the run-time quote book-keeping, the handling
of $@/$* remains unchanged.


I'll try to re-do those fixes based on the approach you're going for here.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2018-03-07 Thread Harald van Dijk

On 3/7/18 7:51 AM, Herbert Xu wrote:

On Wed, Mar 07, 2018 at 07:49:16AM +0100, Harald van Dijk wrote:


This was wrong in the original patch, but I'm not seeing it in the updated
patch that you replied to. When parsing a heredoc where part of delimiter is
quoted, syntax==SQSYNTAX. Since the calls to pgetc_eatbnl() are conditional
on syntax!=SQSYNTAX, there shouldn't be a problem. It would be a different
story if the delimiter could be an unquoted backslash, but thankfully that
is not possible.


Good point.  In that case please resend it with the pgetc2 change
and it should be good to go.


Here you go.

The problem with

  dash -c 'alias x=
  x'

and

  dash -c 'alias bs=\\
  bs
  '

looks like it just needs one extra line, so also attached as a separate 
patch.


Cheers,
Harald van Dijk
diff --git a/src/parser.c b/src/parser.c
index 382658e..8b945e3 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -106,6 +106,7 @@ STATIC void parseheredoc(void);
 STATIC int peektoken(void);
 STATIC int readtoken(void);
 STATIC int xxreadtoken(void);
+STATIC int pgetc_eatbnl();
 STATIC int readtoken1(int, char const *, char *, int);
 STATIC void synexpect(int) __attribute__((__noreturn__));
 STATIC void synerror(const char *) __attribute__((__noreturn__));
@@ -656,8 +657,10 @@ parseheredoc(void)
 		if (needprompt) {
 			setprompt(2);
 		}
-		readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
-here->eofmark, here->striptabs);
+		if (here->here->type == NHERE)
+			readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs);
+		else
+			readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs);
 		n = (union node *)stalloc(sizeof (struct narg));
 		n->narg.type = NARG;
 		n->narg.next = NULL;
@@ -782,7 +785,7 @@ xxreadtoken(void)
 		setprompt(2);
 	}
 	for (;;) {	/* until token or start of word found */
-		c = pgetc();
+		c = pgetc_eatbnl();
 		switch (c) {
 		case ' ': case '\t':
 		case PEOA:
@@ -791,30 +794,23 @@ xxreadtoken(void)
 			while ((c = pgetc()) != '\n' && c != PEOF);
 			pungetc();
 			continue;
-		case '\\':
-			if (pgetc() == '\n') {
-nlprompt();
-continue;
-			}
-			pungetc();
-			goto breakloop;
 		case '\n':
 			nlnoprompt();
 			RETURN(TNL);
 		case PEOF:
 			RETURN(TEOF);
 		case '&':
-			if (pgetc() == '&')
+			if (pgetc_eatbnl() == '&')
 RETURN(TAND);
 			pungetc();
 			RETURN(TBACKGND);
 		case '|':
-			if (pgetc() == '|')
+			if (pgetc_eatbnl() == '|')
 RETURN(TOR);
 			pungetc();
 			RETURN(TPIPE);
 		case ';':
-			if (pgetc() == ';')
+			if (pgetc_eatbnl() == ';')
 RETURN(TENDCASE);
 			pungetc();
 			RETURN(TSEMI);
@@ -822,11 +818,9 @@ xxreadtoken(void)
 			RETURN(TLP);
 		case ')':
 			RETURN(TRP);
-		default:
-			goto breakloop;
 		}
+		break;
 	}
-breakloop:
 	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
 #undef RETURN
 }
@@ -836,7 +830,7 @@ static int pgetc_eatbnl(void)
 	int c;
 
 	while ((c = pgetc()) == '\\') {
-		if (pgetc() != '\n') {
+		if (pgetc2() != '\n') {
 			pungetc();
 			break;
 		}
@@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			attyline();
 			if (syntax == BASESYNTAX)
 return readtoken();
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 			goto loop;
 		}
 #endif
@@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	goto endword;	/* exit outer loop */
 USTPUTC(c, out);
 nlprompt();
-c = pgetc();
+c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 goto loop;		/* continue outer loop */
 			case CWORD:
 USTPUTC(c, out);
@@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	USTPUTC(CTLESC, out);
 	USTPUTC('\\', out);
 	pungetc();
-} else if (c == '\n') {
-	nlprompt();
 } else {
 	if (
 		dblquote &&
@@ -997,7 +989,7 @@ quotemark:
 	USTPUTC(c, out);
 	--parenlevel;
 } else {
-	if (pgetc() == ')') {
+	if (pgetc_eatbnl() == ')') {
 		USTPUTC(CTLENDARI, out);
 		if (!--arinest)
 			syntax = prevsyntax;
@@ -1025,7 +1017,7 @@ quotemark:
 	USTPUTC(c, out);
 }
 			}
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 		}
 	}
 endword:
@@ -1132,7 +1124,7 @@ parseredir: {
 	np = (union node *)stalloc(sizeof (struct nfile));
 	if (c == '>') {
 		np->nfile.fd = 1;
-		c = pgetc();
+		c = pgetc_eatbnl();
 		if (c == '>')
 			np->type = NAPPEND;
 		else if (c == '|')
@@ -1145,7 +1137,7 @@ parseredir: {
 		}
 	} else {	/* c == '<' */
 		np->nfile.fd = 0;
-		switch (c = pgetc()) {
+		switch (c = pgetc_eatbnl()) {
 		case '<':
 			if (sizeof (struct nfile) != sizeof (struct nhere)) {
 np = (union node *)stalloc(sizeof (struct nhere));
@@ -1154,7 +1146,7 @@ parseredir: {
 			np->type = NHERE;
 			heredoc = (struct heredoc *)stalloc(sizeof (struct here

Re: dash tested against ash testsuite: 17 failures

2018-03-06 Thread Harald van Dijk

On 3/6/18 9:45 AM, Herbert Xu wrote:

On Wed, Oct 12, 2016 at 07:24:26PM +0200, Harald van Dijk wrote:



I would have expected another exception to be in alias expansions that
end in a backslash. Shells are not entirely in agreement there, but most
appear to treat this the regular way, meaning

   dash -c 'alias bs=\\
   bs
   '

prints nothing.


I think your patch changes this.  In order to preserve the existing
behaviour (which seems logical), you should change the second pgetc
call in pgetc_eatbnl to pgetc2.


Oh, indeed, thanks.

There's another problem: when there is no following command (as in the 
above example), things break. A shorter reproducer that has failed for 
years is


  $ dash -c 'alias x=
  x'
  dash: 2: Syntax error: end of file unexpected

This breaks because the part where list() checks for NL/EOF, 
checkkwd==0, so aliases aren't expanded. Immediately after that, 
checkkwd is set and the next call to readtoken() would return TEOF, but 
by that point, dash has already committed to parsing a command.


Since this is actually a long-standing problem, not something introduced 
by the patch, I think it's okay to ignore for now. Do you agree?



With more extensive testing, the only issue I've seen is what Jilles
Tjoelker had already mentioned, namely that backslash-newline should be
preserved inside single-quoted strings, and also that it should be
preserved inside heredocs where any part of the delimiter is quoted:

cat <<\EOF
\
EOF

dash's parsing treats this mostly the same as a single-quoted string,
and the same extra check handles both cases.

Here's an updated patch. Hoping this looks okay and can be applied.



I'm fine with the concept.  However, your patch also breaks here-
document parsing when the delimiter is a single backslash.

cat << "\"
\

>

If you can fix these two problems it should be good to go.


As Martijn Dekker wrote, this should work when the backslash is escaped 
or single-quoted, and in my testing does. But what you have is a nice 
start of another corner case:


  cat << "\"
  \
  "EOF
  ok
  "
  EOF

I'm happily surprised to see that dash accepts and gives sensible 
treatment to multi-line heredoc delimiters.


Okay with your one extra pgetc()=>pgetc2() change, then?

Cheers,
Harald van Dijk


Cheers,

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix "Illegal number" on FreeBSD & macOS for x=; echo $((x))

2018-03-05 Thread Harald van Dijk

On 3/6/18 2:15 AM, Martijn Dekker wrote:

Op 06-03-18 om 00:23 schreef Harald van Dijk:

On 3/6/18 12:23 AM, Martijn Dekker wrote:

"All variables shall be initialized to zero if they are not otherwise
assigned by the input to the application."


In your example, x has been assigned. Its value is empty, but empty and
unset are two different things.

But I see now that dash gives the same error when x is unset. That part
is definitely wrong for exactly the reason you point out.


By default, all unset variables are considered empty, so they should act
the same.


Except where special behaviour is specified for unset variables, which I 
believed to be the case here. But see below.



That brings me to another possible issue, though:

$ dash -uc 'unset x; echo $((x))'
0

Shouldn't that give an "parameter not set" error under set -u?
Interestingly, only bash and AT ksh93 currently do that.


This is <http://austingroupbugs.net/view.php?id=559>. It's unspecified 
for now, but will likely become an error in the future.



Also, consensus among existing shells appears to be universal.


Mostly, but not fully. yash is the exception:

   $ yash -c 'unset x; echo $((x))'
   0
   $ yash -c 'x=; echo $((x))'




But in POSIX mode it acts like other shells:

$ yash -o posix -c 'x=; echo $((x))'
0


I stand corrected! Indeed, then it does appear that all shells are in 
agreement. And when POSIX is (arguably) unclear but all shells are in 
agreement, the shells' behaviour should generally just be taken as the 
correct behaviour.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Harald van Dijk

On 3/4/18 9:08 PM, Martijn Dekker wrote:

Op 04-03-18 om 16:46 schreef Harald van Dijk:

FreeBSD sh also prints a blank line here.

[...]

Like above, FreeBSD sh behaves like ksh.


I stand corrected.

Is there any port of FreeBSD sh to other operating systems? It would be
much more convenient for me to include it in my tests if I didn't have
to launch a FreeBSD VM and rsync & run the test scripts separately.


None that I know of. Running the test script over ssh might be slightly 
less difficult, but nothing as easy as a port. The source code contains 
several very much FreeBSD-specific bits.



Yes, the inconsistency should be fixed. Either it should be treated as
quoted or as unquoted, but not quoted-unless-it-comes-from-a-variable. I
have no strong feelings on which it should be.


Neither do I, so I would default to the behaviour that both pre-exists
in dash and corresponds with the majority of other shells.


I went for the behaviour that required the fewest changes for now, which 
is to treat them as unquoted. If it is agreed that it should be quoted, 
it requires some additional (minor) complications in the parser, because 
the existing state would no longer be sufficient to determine whether } 
should end the substitution. But yes, I agree that given how long dash 
has treated this as quoted, it makes sense to keep that, unless there's 
a compelling reason not to.



[...]

$ src/dash -c 'printf "%s\n" "${$+\}}"'
\}

Expected output: }  (no backslash), as in bash 4, yash, ksh93, pdksh,
mksh, zsh. In other words: it should be possible to escape a '}' with a
backslash within the parameter expansion, even if the expansion is
quoted.

POSIX ref.: 2.6.2 Parameter Expansion
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02

| Any '}' escaped by a  or within a quoted string, and
| characters in embedded arithmetic expansions, command substitutions,
| and variable expansions, shall not be examined in determining the
| matching '}'.


I believe this actually requires dash's behaviour. This says the first }
isn't examined in determining the matching '}', but only that: it just
says the parameter expansion expression is $+\}. It doesn't say the
backslash is removed.


I believe the word "escaped" implies that removal. If a '}' is escaped
by a backslash, it's implied that the backslash is removed as this
escaping is parsed, just as it's implied that quotes are removed from a
quoted string.


That's not implied, that's stated:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_07


The quote characters ( , single-quote, and double-quote) that were 
present in the original word shall be removed unless they have themselves been quoted.


In this case, the backslash was quoted, so this doesn't apply.


I agree that it would be much better to print } here though.


All other current shells except bosh (schilytools sh) agree, too -- even
FreeBSD sh, and I checked it this time.


Shells agree on the simple cases to remove the backslash:

  ${x+\}}
  "${x+\}}"
  <In ${x+"\}"}, most shells keep the backslash. ksh and FreeBSD sh remove 
it. I think it makes most sense to keep it, because the general rule for 
\ in double-quoted strings is that it's only removed if the following 
character would have been special. This patch removes it.


In "${x+"\}"}", of the shells that treat the \} as quoted, most shells 
keep the backslash. bash removes it. I think it makes most sense to keep 
the backslash for the same reason as above.


In <zsh keep it, and FreeBSD sh treats the " as a literal. Again I think it 
makes most sense to keep the backslash (but remove the "). This patch 
removes it.


Another pre-existing dash parsing bug that I encountered now is $(( ( 
$(( 1 )) ) )). This should expand to 1, but gives a hard error in dash, 
again due to the non-recursive nature of dash's parser. A small 
generalisation of what I've been doing so far could handle this, so it 
makes sense to me to try to achieve this at the same time.


Cheers,
Harald van Dijk
diff --git a/src/Makefile.am b/src/Makefile.am
index 139355e..525f8ef 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -66,7 +66,7 @@ syntax.c syntax.h: mksyntax
 signames.c: mksignames
 	./$^
 
-mksyntax: token.h
+mksyntax: parser.h token.h
 
 $(HELPERS): %: %.c
 	$(COMPILE_FOR_BUILD) -o $@ $<
diff --git a/src/TOUR b/src/TOUR
index 056e79b..f6a4641 100644
--- a/src/TOUR
+++ b/src/TOUR
@@ -150,6 +150,7 @@ special codes defined in parser.h.  The special codes are:
 CTLVAR  Variable substitution
 CTLENDVAR   End of variable substitution
 CTLBACKQCommand substitution
+CTLBACKQ|CTLQUOTE   Command substitution inside double quotes
 CTLESC  Escape next character
 
 A variable substitution contains the following elements:
@@ -

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Harald van Dijk

On 3/4/18 7:05 PM, Denys Vlasenko wrote:

On Fri, Mar 2, 2018 at 7:03 PM, Harald van Dijk <har...@gigawatt.nl> wrote:

On 02/03/2018 18:00, Denys Vlasenko wrote:


On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk <har...@gigawatt.nl>
wrote:


Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



I was looking into this specific example and I believe it is a _bash_ bug.

The [a\]] is misinterpreted by it (and probably by many people).
The gist is: \] is not a valid escape for ] in set glob expression.
Glob sets have no escaping at all, ] can be in a set
if it is the first char: []abc],
dash can be in a set if it is first or last: [abc-],
[ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.

Therefore, "[a\]]" glob pattern means "a or \, then ]".
Since that does not match "a", the result of ${foo#[a\]]}> should be "a".


Are you sure about this? "Patterns Matching a Single Character"'s first
paragraph contains "A  character shall escape the following
character. The escaping  shall be discarded." The shell does this
first.


I have problems with "The shell does this first" statement.

It's useful to view the entire discussion of glob pattern matching
as a discussion of how fnmatch(pattern, string, flags) should behave
(even if a particular shell implementation chose to not use
C library's fnmatch() to implement its globbing).


Okay. dash has an option to use fnmatch() for pattern matching.

But POSIX says fnmatch() performs pattern matching the way the shell 
does, not the other way around. And there are a few differences that 
follow from it, because character quote status cannot be represented the 
same way in a C string as in the description of the shell.



Otherwise (IOW: if you allow gobbing to depend on shell's quoting),
rules for globbing for different applications will not be consistent.
Which would be bad.


That's already the case, and cannot be helped. The shell pattern 
"[a]"[a] is supposed to match a file named "[a]a", and does. But if that 
file exists, and I run find . -name '"[a]"[a]', I won't get any results. 
I could use find . -name '\[a\][a]' for that though. And dash, if built 
with fnmatch() support, translates it to that in order to pass it to 
fnmatch().


This matches what POSIX says: POSIX doesn't make the removal of " part 
of pattern matching, and find -name and fnmatch() *only* implement the 
pattern matching part of the shell.



As I see it, shell should massage input according to shell rules
(quote/bkslash removal et al), then use fnmatch() or glob(), or its own
internal implementations of them.

bash seems to not do it. It probably has a "combined" routine
which does both in one step, which allows quote removal
to interfere with globbing. Here's the proof:

$ x='a]'; echo _${x#[a\]]}_
_]_

In the above code, what pattern should be fed to fnmatch(),
assuming shell uses fnmatch() to implement ${x#pattern}?
Pattern should be "[a]]" because by shell rules "\]" in
an unquoted string is "]".


By the normal shell rules, there are two places backslashes get removed. 
The first is during pattern matching. Not before, during.


If fnmatch() is used to implement shell pathname expansion and other 
pattern matching, the string to pass here is literally [a\]]. Just like 
how for \*, the string to pass is literally \*, and it would be horribly 
wrong to pass * here as if it were unquoted. fnmatch() will remove the 
backslashes and take the following character literally.


The second place where the shell removes backslashes is during quote 
removal. But this takes place after pathname expansion.



But try this:

$ x='a]'; echo _${x#[a]]}_
__

Here, pattern should be "[a]]" as well - it literally is.


Here, the pattern should indeed be [a]].


But the results are different!

Evidently, bash does _not_ perform quote removal (more precisely,
backslash removal) on pattern string. Somehow, globbing code
knows \ was there.

(And this globbing code, in my opinion, also misinterprets [a\]]
as "set of 'a' or ']'", but (a) I might be wrong on this, and
(b) this is a bit offtopic, we discuss ${x#pattern} handling here).

To me, it looks that bash behavior is buggy regardless of what \]
means in glob patterns. These two should be equivalent:

x='a]'; echo _${x#[a\]]}_
x='a]'; echo _${x#[a]]}_

because they should use the same pattern for globbing match.


See above.


Alternative possibility is that pattern in ${x#pattern} is not handled
by the usual shell rules: backslashes are not removed.
This would be VERY ugly as soon as nested variable expansions are considered.


There are and there are supposed to be a few differences in how 

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Harald van Dijk

On 3/4/18 4:26 PM, Martijn Dekker wrote:

Op 04-03-18 om 11:44 schreef Harald van Dijk:

When CTLENDVAR is seen, I double-check that syntax has the expected
value. This fixes the handling of "${$+"}"}", where the inner } was
seen as ending the variable substitution.


Also looks like dash with your patch considers the "}" to be quoted:

$ src/dash -c 'IFS=}; printf %s\\n "${$+"}"}"'
}

Only AT ksh93 prints an empty string there, as it doesn't consider the
double quotes to be nested, so the "}" is unquoted. Ash produces a
syntax error, like dash before this patch. All other shells print a }.
So dash with this patch behaves like the majority.


FreeBSD sh also prints a blank line here.


This changes how "${x+"$y"}" get handled: POSIX is silent about
whether the $y should be treated as quoted. dash has treated it as
quoted for a very long time. ash has historically treated it as
unquoted. With this patch, it gets treated as unquoted.


That seems inconsistent with how it handles "${$+"}"}", in which the "}"
is treated as quoted (see above).

ksh93 is the only existing shell that treats the $y as unquoted, so I
think it would be better if dash continued to treat the $y as quoted.

Even if not, the inconsistency should be fixed.


Like above, FreeBSD sh behaves like ksh.

Yes, the inconsistency should be fixed. Either it should be treated as 
quoted or as unquoted, but not quoted-unless-it-comes-from-a-variable. I 
have no strong feelings on which it should be.



Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how
"$@" got handled and reverting that changed it, I looked into how
this works and fixed another bug. It also changes the handling of $*
and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS
properly at all, even if all parameters were non-empty. dash 0.5.9.1
preserves empty parameters. With this patch, they get removed just
like in bash. POSIX allows for either.

I don't think it does. POSIX implies that empty $@ and $* generates zero
fields. 2.5.2 Special Parameters:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02
| @
|Expands to the positional parameters, starting from one, initially
|producing one field for each positional parameter that is set.

Since there are no positional parameters, no fields should initially be
produced at all. Same for $*.

So I do believe your patch correctly fixes a bug here.


By empty parameters, I meant parameters that had been set to an empty 
string. It's covered by the 'set -- a ""' in my tests. You're right that 
after 'set --', unquoted $@ should not produce any fields. I hadn't even 
noticed that dash got that wrong and that my patch had fixed it.



I would be a bit surprised if the patch is acceptable in its current
form, but it's worth seeing which of the current results are definitely
correct, which of the results are acceptable, which results may well be
unwanted, and which special cases I missed.


According to my tests (i.e. the modernish test suite), nearly everything
is POSIXly correct. There's only one parameter expansion problem left
that I can find, and it existed before this patch as well:

$ src/dash -c 'printf "%s\n" "${$+\}}"'
\}

Expected output: }  (no backslash), as in bash 4, yash, ksh93, pdksh,
mksh, zsh. In other words: it should be possible to escape a '}' with a
backslash within the parameter expansion, even if the expansion is quoted.

POSIX ref.: 2.6.2 Parameter Expansion
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
| Any '}' escaped by a  or within a quoted string, and
| characters in embedded arithmetic expansions, command substitutions,
| and variable expansions, shall not be examined in determining the
| matching '}'.


I believe this actually requires dash's behaviour. This says the first } 
isn't examined in determining the matching '}', but only that: it just 
says the parameter expansion expression is $+\}. It doesn't say the 
backslash is removed. Then, \} is taken as part of a double-quoted 
string, and inside double-quoted strings, a backslash that isn't 
followed by $, `, ", \ or a newline is taken as a literal backslash. I 
agree that it would be much better to print } here though.


Thanks for the testing! I'd noticed I had an off-by-one error that 
causes problems when an expansion ends in another expansion 
(${x+${x+}}). I'll try to improve it to also handle the issues you've 
pointed out.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Harald van Dijk

On 3/2/18 11:58 AM, Harald van Dijk wrote:

On 02/03/2018 08:49, Herbert Xu wrote:

If we fix this in the parser then everything should just work.


Right, that's the approach FreeBSD sh has taken that I referred to in my 
message from Feb 18, that I'd personally prefer as well. It basically 
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting 
syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse 
of a variable expansion starts, and finding a sensible way to change the 
syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD 
sh, an explicit stack of syntaxes is created for this, but that might be 
avoidable: with slight modifications to what gets stored in the byte 
after CTLVAR/CTLARI, it might be possible to go back through the parser 
output to determine the syntax to revert to. I'll see if I can get that 
working.


Since I didn't see how to avoid this approach, I went ahead with this 
attempt and the attached is the result.


I started out by reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c. 
Since that also removed some dead code, I re-removed the read code. I 
then modified the byte after CTLVAR so that it didn't just store whether 
the result was quoted, it stored the prior syntax, and forcibly reset 
syntax to either BASESYNTAX or DQSYNTAX as appropriate. Then, in 
CTLENDVAR, I look for the opening CTLVAR, and use that to restore the 
prior syntax. The same goes for CTLARI/CTLENDARI too.


When CTLENDVAR is seen, I double-check that syntax has the expected 
value. This fixes the handling of "${$+"}"}", where the inner } was seen 
as ending the variable substitution.


This fixes more cases than just backslashes and single quotes: another 
character that's special in unquoted contexts is ~, so "${HOME#~}" 
should expand to an empty string.


This changes how $(( ${$+"123"} )) gets handled: POSIX doesn't really 
answer this, I think. POSIX says that $(( "123" )) is a syntax error, 
but doesn't address whether " is special when it appears in other places 
than directly in the $((...)). Most shells accept $(( ${$+"123"} )), and 
with this patch, dash accepts it too.


This changes how "${x+"$y"}" get handled: POSIX is silent about whether 
the $y should be treated as quoted. dash has treated it as quoted for a 
very long time. ash has historically treated it as unquoted. With this 
patch, it gets treated as unquoted.


Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how "$@" 
got handled and reverting that changed it, I looked into how this works 
and fixed another bug. It also changes the handling of $* and $@ when 
IFS is set but empty: dash 0.5.8 didn't handle empty IFS properly at 
all, even if all parameters were non-empty. dash 0.5.9.1 preserves empty 
parameters. With this patch, they get removed just like in bash. POSIX 
allows for either.


I would be a bit surprised if the patch is acceptable in its current 
form, but it's worth seeing which of the current results are definitely 
correct, which of the results are acceptable, which results may well be 
unwanted, and which special cases I missed.


Cheers,
Harald van Dijk

command:  echo "\*"
bash: \*
dash 0.5.8:   \ \
dash 0.5.9.1: \ \
dash patched: \*

command:  case \\ab in "\*") echo BUG;; esac
bash:
dash 0.5.8:   BUG
dash 0.5.9.1: BUG
dash patched:

command:  case \\a in "\?") echo BUG;; esac
bash:
dash 0.5.8:   BUG
dash 0.5.9.1: BUG
dash patched:

command:  foo=\\; echo "<${foo#[\\]]}>"
bash: <\>
dash 0.5.8:   <\>
dash 0.5.9.1: <\>
dash patched: <\>

command:  foo=a; echo "<${foo#[a\]]}>"
bash: <>
dash 0.5.8:   <>
dash 0.5.9.1: <>
dash patched: <>

command:  x=yz; echo "${x#'y'}"
bash: z
dash 0.5.8:   yz
dash 0.5.9.1: yz
dash patched: z

command:  x=yz; echo "${x+'y'}"
bash: 'y'
dash 0.5.8:   'y'
dash 0.5.9.1: 'y'
dash patched: 'y'

command:  x=""; echo "${x#"${x+''}"''}"
bash: ''
dash 0.5.8:
dash 0.5.9.1:
dash patched: ''

command:  HOME=/; echo "${HOME#~}"
bash:
dash 0.5.8:   /
dash 0.5.9.1: /
dash patched:

command:  x="13"; echo $(( ${x#'1'} ))
bash: 3
dash 0.5.8:   13
dash 0.5.9.1: 13
dash patched: 3

command:  echo $(( ${$+"123"} ))
bash: 123
dash 0.5.8:   dash: 1: arithmetic expression: expecting primary: " "123" "
dash 0.5.9.1: dash: 1: arithmetic expression: expecting primary: " "123" "
dash patched: 123

command:  set -- a ""; space=" "; printf "<%s>" "$@"$space
bash: <>
dash 0.5.8:   < >
dash 0.5.9.1: < >
dash patched: <>

command

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 17:33, Herbert Xu wrote:

On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote:


But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}",
the ' is a quote character. This part is hard. If the above is done simply
using another local variable, then the parse of the nested ${y-} would
clobber that variable.


I don't see why that's hard.  You just need to remember whether
you're in a pattern context (i.e., after a %/%% or #/##).  If you
are, then you need to go back to basesyntax instead of dqsyntax
until the next right brace.


Let me slightly modify my example so that the effect becomes different:

  "${x#"${y-*}"''}"

When the parser has processed

  "${x#"${y-

would the state be "in a pattern context", or "not in a pattern context"?

If the state is "in a pattern context", how do you prevent the next * 
from being taken as unquoted? It should be taken as quoted.


If it stores "not in a pattern context", and the parser is processing 
the last character in


  "${x#"${y-*}

how can it reset the state to "in a pattern context"? Where could this 
state be stored?


With arbitrarily deep nesting of variable expansions, I do not see how 
you can avoid requiring arbitrarily large state, which gets difficult 
with dash's non-recursive parser. Mind you, I do hope I'm missing 
something obvious here!


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 18:00, Denys Vlasenko wrote:

On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk <har...@gigawatt.nl> wrote:

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'




I was looking into this specific example and I believe it is a _bash_ bug.

The [a\]] is misinterpreted by it (and probably by many people).
The gist is: \] is not a valid escape for ] in set glob expression.
Glob sets have no escaping at all, ] can be in a set
if it is the first char: []abc],
dash can be in a set if it is first or last: [abc-],
[ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.

Therefore, "[a\]]" glob pattern means "a or \, then ]".
Since that does not match "a", the result of ${foo#[a\]]}> should be "a".


Are you sure about this? "Patterns Matching a Single Character"'s first 
paragraph contains "A  character shall escape the following 
character. The escaping  shall be discarded." The shell does 
this first. It removes the backslash (remembering that the following 
character is escaped) before it starts interpreting the result as a 
bracket expression, and so never gets to the point where the \ should be 
taken literally.


  case \] in [\]]) echo matched ;; esac

prints "matched" in all shells I can check.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 16:28, Herbert Xu wrote:

On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote:



If we fix this in the parser then everything should just work.


Right, that's the approach FreeBSD sh has taken that I referred to in my
message from Feb 18, that I'd personally prefer as well. It basically
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax
to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a
variable expansion starts, and finding a sensible way to change the syntax
back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an
explicit stack of syntaxes is created for this, but that might be avoidable:
with slight modifications to what gets stored in the byte after
CTLVAR/CTLARI, it might be possible to go back through the parser output to
determine the syntax to revert to. I'll see if I can get that working.


Yes but that's overkill just to fix single quote within patterns.
We already support nested double-quotes in patterns correctly.  As
single quotes cannot nest, it should be an easy fix.


Single quotes indeed cannot nest, but you do need to reliably determine 
when a single quote is a special character, which gets very tricky very 
quickly with nested substitutions.


In "${x+''}", the ' is the literal ' character. In "${x#''}", the ' is a 
quote character. This part is easy, this part is just a matter of 
setting another variable when the parse of the substitution starts.


But in "${x+${y-}''}", the ' is the literal ' character. In 
"${x#${y-}''}", the ' is a quote character. This part is hard. If the 
above is done simply using another local variable, then the parse of the 
nested ${y-} would clobber that variable.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 08:49, Herbert Xu wrote:

On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote:

On 01/03/2018 00:04, Harald van Dijk wrote:

$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the
pattern is considered unquoted regardless of the outer quotation marks.
Because of that, the single quote characters should not be taken
literally, but should be taken as quoting the y. ksh, posh and zsh agree
with bash.


Unfortunately, this causes another problem with all of the backslash
approaches so far:

   x=''; printf "%s\n" "${x#''}"

This should print a blank line. (bash, ksh, posh and zsh agree.)

Here, dash's parser stores '$\$\', where $ is a control character. preglob
would need to turn this into . The problem is again that preglob
cannot increase the string length. Perhaps the parser needs to store this as
'$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it
requires some more invasive changes.


These are different issues.  dash's parser currently does not
understand nested quoting in patterns at all.  That is, if your
parameter expansion are within double quotes, then dash at the
parser level will consider the pattern to be double-quoted.  Thus
any nested single-quotes will be literals instead of actual quotes.


That's the same thing though. The problem with the backslashes is also 
that dash sees them as double-quoted when they should be seen as 
unquoted, and the approach taken in commit 
7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c that lasts to this day was 
specifically to *not* fix this in the parser, but to simply have the 
parser record enough information so that quote status can be determined 
and patched up during expansion. It's just that in the case of single 
quotes, expansion was never modified to recognise them. Thinking some 
more, I don't think the parser actually records enough information to 
let that work.



If we fix this in the parser then everything should just work.


Right, that's the approach FreeBSD sh has taken that I referred to in my 
message from Feb 18, that I'd personally prefer as well. It basically 
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting 
syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse 
of a variable expansion starts, and finding a sensible way to change the 
syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD 
sh, an explicit stack of syntaxes is created for this, but that might be 
avoidable: with slight modifications to what gets stored in the byte 
after CTLVAR/CTLARI, it might be possible to go back through the parser 
output to determine the syntax to revert to. I'll see if I can get that 
working.



Cheers,

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-01 Thread Harald van Dijk

On 01/03/2018 00:04, Harald van Dijk wrote:

$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the 
pattern is considered unquoted regardless of the outer quotation marks. 
Because of that, the single quote characters should not be taken 
literally, but should be taken as quoting the y. ksh, posh and zsh agree 
with bash.


Unfortunately, this causes another problem with all of the backslash 
approaches so far:


  x=''; printf "%s\n" "${x#''}"

This should print a blank line. (bash, ksh, posh and zsh agree.)

Here, dash's parser stores '$\$\', where $ is a control character. 
preglob would need to turn this into . The problem is again that 
preglob cannot increase the string length. Perhaps the parser needs to 
store this as '$\$\$\$\', $ being either CTLESC or that new CTLBACK? 
Either way, it requires some more invasive changes.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-28 Thread Harald van Dijk

On 2/28/18 11:14 PM, Denys Vlasenko wrote:

Guys, while you work on fixing this, consider taking a look at
some more parsing cases in this bbox bug:

https://bugs.busybox.net/show_bug.cgi?id=10821

I suppose some of them may fail in dash too (I did not test, sorry).


$'...' isn't supported in dash, but the underlying problem does appear 
to exist in dash too:


$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the 
pattern is considered unquoted regardless of the outer quotation marks. 
Because of that, the single quote characters should not be taken 
literally, but should be taken as quoting the y. ksh, posh and zsh agree 
with bash.


But: POSIX does not say the same for +/-/..., so dash is correct there:

$ bash -c 'x=yz; echo "${x+'"'y'"'}"'
'y'

$ dash -c 'x=yz; echo "${x+'"'y'"'}"'
'y'

Because of that, for that report's follow-up comment about

  (unset x; echo "${x-$'\x41'}")

I would not have expected this to be taken as a $'...' string. ksh 
(which does support $'...' strings too) prints the literal text $'\x41', 
and so does bash if invoked as sh.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-26 Thread Harald van Dijk

On 26/02/2018 09:40, Harald van Dijk wrote:

On 26/02/2018 08:03, Harald van Dijk wrote:

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


And hopefully I've got a good example now:

$ touch '\www'
$ src/dash -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
\*a
a
\www

In both $z and ${x#$z} / ${y#$z}, $z is unquoted. In $z, it's 
interpreted as a literal backslash followed by a * wildcard. In ${x#$z}, 
it's interpreted as an escaped * character. I don't understand why these 
cases should be treated differently.


ksh agrees with dash.

bash also behaves strangely here. In both cases, it treats the * as 
escaped by the preceding backslash, but in $z, the backslash is 
preserved, whereas in ${x#$z} / ${y#$z}, the backslash is removed:


$ bash -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
\*a
a
\*

posh is more consistent: in both $z and in ${x#$z} / ${y#$z}, $z is 
treated as a literal backslash followed by a * wildcard:


$ posh -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
*a
*a
\www

I'm having trouble telling from the spec what the expected behaviour is 
here, and because of that, whether dash's behaviour is already correct 
or needs to be modified.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-26 Thread Harald van Dijk

On 26/02/2018 08:03, Harald van Dijk wrote:

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


$ dash -c 'a=\\a; echo "${a#\*}"'

$ bash -c 'a=\\a; echo "${a#\*}"'
\a

My patch and Herbert's preserve dash's current behaviour, your patch 
makes it print a. None of that is correct, the result should be the same 
as bash.


Never mind, this is a very bad test. I forgot about echo's backslash 
handling. Had my terminal emulator given any feedback on \a, I might 
have noticed that before sending. This test doesn't show anything wrong 
in this area, but I'll check a bit more.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-25 Thread Harald van Dijk

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


$ dash -c 'a=\\a; echo "${a#\*}"'

$ bash -c 'a=\\a; echo "${a#\*}"'
\a

My patch and Herbert's preserve dash's current behaviour, your patch 
makes it print a. None of that is correct, the result should be the same 
as bash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-24 Thread Harald van Dijk

On 2/24/18 5:52 PM, Herbert Xu wrote:

On Sat, Feb 24, 2018 at 10:47:07AM +0100, Harald van Dijk wrote:


It seems like the new control character doesn't get escaped in one place
where it should be, and gets lost because of that:

Unpatched:

$ dash -c 'x=`printf \\\211X`; echo $x | cat -v'
M-^IX

Patched:

$ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v'
X


Hmm, it works here.  Can you check that your Makefile is up-to-date


It was.


and the generated syntax.c looks like this:

const char basesyntax[] = {
   CEOF,CSPCL,   CWORD,   CCTL,
   CCTL,CCTL,CCTL,CCTL,
   CCTL,CCTL,CCTL,CCTL,


Thanks, that was what was wrong. Although mksyntax did get re-executed 
because of your Makefile change, re-executing it didn't help: mksyntax 
doesn't use parser.h at run time, only at build time. So I think the 
Makefile.am change should instead be:


--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -66,7 +66,7 @@ syntax.c syntax.h: mksyntax
 signames.c: mksignames
./$^

-mksyntax: token.h
+mksyntax: parser.h token.h

 $(HELPERS): %: %.c
$(COMPILE_FOR_BUILD) -o $@ $<

Cheers,
Harald van Dijk

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-24 Thread Harald van Dijk

On 2/24/18 1:33 AM, Herbert Xu wrote:

On Wed, Feb 21, 2018 at 11:47:58PM +0100, Harald van Dijk wrote:

On 2/21/18 2:50 PM, Denys Vlasenko wrote:

I propose replacing this:


Agreed, that looks better. Thanks!


Good work guys.  However, could you check to see if this patch
works too? It passes all my tests so far.


It seems like the new control character doesn't get escaped in one place 
where it should be, and gets lost because of that:


Unpatched:

$ dash -c 'x=`printf \\\211X`; echo $x | cat -v'
M-^IX

Patched:

$ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v'
X

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-21 Thread Harald van Dijk

On 2/21/18 2:50 PM, Denys Vlasenko wrote:

I propose replacing this:


Agreed, that looks better. Thanks!

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-19 Thread Harald van Dijk

On 2/18/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 10:44 PM, Harald van Dijk wrote:

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in 
_rmescapes. It passes your test case and mine, but I'll do more 
extensive testing.


It causes preglob's string to potentially grow larger than the 
original. When called with RMESCAPE_ALLOC, that can be handled by 
increasing the buffer size, but preglob also gets called without 
RMESCAPE_ALLOC to modify a string in-place. That's never going to work 
with this approach. Back to the drawing board...


There is a way to make it work: ensure sufficient memory is always 
available. Instead of inserting CTLESC, which caused problems, 
CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a 
no-op here.


It required one obvious additional trivial change to the CHECKSTRSPACE 
invocation, but with that added, the attached passed all testing I could 
think of. Does this look okay to include, did I miss something, or is 
there perhaps a better alternative?


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+if (notescaped)
+	*q++ = '\\';
+			} else {
+/* naked back slash */
+notescaped = 0;
+goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/parser.c b/src/parser.c
index 382658e..a847b2e 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -909,7 +909,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 #endif
 		CHECKEND();	/* set c to PEOF if at end of here document */
 		for (;;) {	/* until end of line or end of word */
-			CHECKSTRSPACE(4, out);	/* permit 4 calls to USTPUTC */
+			CHECKSTRSPACE(5, out);	/* permit 5 calls to USTPUTC */
 			switch(syntax[c]) {
 			case CNL:	/* '\n' */
 if (syntax == BASESYNTAX)
@@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			eofmark != NULL
 		)
 	) {
+		/* Reserve extra memory in case this backslash will require later escaping. */
+		USTPUTC(CTLQUOTEMARK, out);
+		USTPUTC(CTLQUOTEMARK, out);
 		USTPUTC('\\', out);
 	}
 	USTPUTC(CTLESC, out);


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-18 Thread Harald van Dijk

On 2/14/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 10:44 PM, Harald van Dijk wrote:

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in _rmescapes. 
It passes your test case and mine, but I'll do more extensive testing.


It causes preglob's string to potentially grow larger than the original. 
When called with RMESCAPE_ALLOC, that can be handled by increasing the 
buffer size, but preglob also gets called without RMESCAPE_ALLOC to 
modify a string in-place. That's never going to work with this approach. 
Back to the drawing board...


There is a way to make it work: ensure sufficient memory is always 
available. Instead of inserting CTLESC, which caused problems, 
CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a 
no-op here. I'm currently testing the attached.


To be honest, FreeBSD sh's approach, keeping a syntax stack to detect 
characters' meaning reliably at parse time, feels more elegant to me 
right now, but that requires invasive and therefore risky changes to 
dash's code.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+if (notescaped)
+	*q++ = '\\';
+			} else {
+/* naked back slash */
+notescaped = 0;
+goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/parser.c b/src/parser.c
index 382658e..bb16a46 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			eofmark != NULL
 		)
 	) {
+		/* Reserve extra memory in case this backslash will require later escaping. */
+		USTPUTC(CTLQUOTEMARK, out);
+		USTPUTC(CTLQUOTEMARK, out);
 		USTPUTC('\\', out);
 	}
 	USTPUTC(CTLESC, out);


  1   2   >