Re: [PATCH] shell: Group readdir64/dirent64 with open64

2020-07-21 Thread Martijn Dekker

Op 22-07-20 om 04:58 schreef Herbert Xu:

Thanks for the report, does this patch help?


Yup, all fixed. Thanks.

- M.

--
||  modernish -- harness the shell
||  https://github.com/modernish/modernish
||
||  KornShell lives!
||  https://github.com/ksh93/ksh


dash 0.5.11.1 doesn't build on macOS

2020-07-21 Thread Martijn Dekker

This commit introduced the build failure:
3e3e7af1a49273a5e49d50565b3b079a2ab19142

The first error is:
expand.c:1365:9: error: incomplete definition of type 'struct dirent64'
if (dp->d_name[0] == '.' && ! matchdot)
~~^

Full build log:

$ uname -a
Darwin breedzicht 18.7.0 Darwin Kernel Version 18.7.0: Thu Jun 18 
20:50:10 PDT 2020; root:xnu-4903.278.43~1/RELEASE_X86_64 x86_64

$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/usr/include/c++/4.2.1

Apple clang version 11.0.0 (clang-1100.0.33.12)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ CFLAGS=-Os ./configure && make
checking for a BSD-compatible install... /opt/local/bin/ginstall -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /opt/local/bin/gmkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking whether make supports the include directive... yes (GNU style)
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h usability... no
checking minix/config.h presence... no
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking for build system compiler... gcc
checking for __attribute__((__alias__()))... no
checking alloca.h usability... yes
checking alloca.h presence... yes
checking for alloca.h... yes
checking paths.h usability... yes
checking paths.h presence... yes
checking for paths.h... yes
checking whether _PATH_BSHELL is declared... yes
checking whether _PATH_DEVNULL is declared... yes
checking whether _PATH_TTY is declared... yes
checking whether isblank is declared... yes
checking size of intmax_t... 8
checking size of long long int... 8
checking whether PRIdMAX is declared... yes
checking for bsearch... yes
checking for faccessat... yes
checking for getpwnam... yes
checking for getrlimit... yes
checking for isalpha... yes
checking for killpg... yes
checking for mempcpy... no
checking for sigsetmask... yes
checking for stpcpy... yes
checking for strchrnul... no
checking for strsignal... yes
checking for strtod... yes
checking for strtoimax... yes
checking for strtoumax... yes
checking for sysconf... yes
checking for signal... yes
checking for stat64... yes
checking for open64... no
checking for stat::st_mtim... no
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating config.h
config.status: executing depfiles commands
make  all-recursive
Making all in src
  CC   builtins.def
  GEN  builtins.h
  CC   mknodes
  GEN  nodes.h
  GEN  token.h
  CC   mksyntax
  GEN  syntax.h
make  all-am
  CC   alias.o
  CC   arith_yacc.o
  CC   arith_yylex.o
  CC   cd.o
cd.c:135:7: warning: 'stat64' is deprecated: first deprecated in macOS 
10.6 [-Wdeprecated-declarations]

if (stat64(p, ) >= 0 && S_ISDIR(statb.st_mode)) {
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/stat.h:402:9: 
note: 'stat64' has been explicitly marked deprecated here
int stat64(const char *, struct stat64 *) 
__OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_5, __MAC_10_6, __IPHONE_NA, 
__IPHONE_NA);

^
1 warning generated.
  CC   error.o
  CC   eval.o
  CC   exec.o
exec.c:346:11: warning: 'stat64' is deprecated: first deprecated in 
macOS 10.6 [-Wdeprecated-declarations]

while (stat64(name, ) < 0) {
   ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/stat.h:402:9: 
note: 'stat64' has been explicitly marked deprecated here
int stat64(const char *, struct stat64 *) 

Re: [PATCH] dash: Fix stack overflow from infinite recursion in script

2019-07-18 Thread Martijn Dekker

Op 18-07-19 om 20:27 schreef Jason Bowen:

A recursion limit is not without precedent. CPython sets a platform-
dependent recursion limit for this same purpose:

https://docs.python.org/3/library/sys.html#sys.setrecursionlimit

But indeed, the limit is "artificial" or arbitrary. It should just be
"enough" that it isn't hit often with typical use.


The arbitrariness of a hard-coded limit is indeed a problem. OK, 
crashing with a segfault is ugly because it makes the user suspect the 
shell is buggy, not the script. On the other hand, with a 'reasonable' 
limit, a script might still segfault on very small systems, while that 
limit disallows full use of system resources on large systems.


Regarding precedent: as far as I know, among POSIXy shells, a recursion 
limit is currently only implemented in zsh, with a user-settable 
FUNCNEST variable.


If such a feature were to be implemented, I think a configurable limit 
is preferable to a hard-coded limit, and IMHO zsh's precedent should be 
followed (including the name of the variable).


$ zsh -c 'f() { f; }; f'
f: maximum nested function level reached; increase FUNCNEST?

Of course, if you set FUNCNEST to a very large value, the stack will 
still overflow.


$ zsh -c 'FUNCNEST=99; f() { f; }; f'
Segmentation fault: 11

Thanks,

- M.

--
modernish -- harness the shell
https://github.com/modernish/modernish


Re: [PATCH] dash: Fix stack overflow from infinite recursion in script

2019-07-18 Thread Martijn Dekker

Op 18-07-19 om 22:15 schreef Martijn Dekker:
Regarding precedent: as far as I know, among POSIXy shells, a recursion 
limit is currently only implemented in zsh, with a user-settable 
FUNCNEST variable.


I was mistaken: both AT ksh93 and NetBSD 8.1 sh have a hard-coded 
limit of 1000.


- M.

--
modernish -- harness the shell
https://github.com/modernish/modernish


[BUG] incomplete whitespace removal of unquoted expansion

2019-02-16 Thread Martijn Dekker
For unset foo:

dash-git
$ src/dash -c 'set -- ${foo- bar }; echo "[$1]"'
[bar ]

Release versions of dash and all other shells output:
[bar]

The change in behaviour appears to have been introduced by commit
3cd5386 ("expand: Do not reprocess data when expanding words").

- Martijn

-- 
modernish -- harness the shell
https://github.com/modernish/modernish


Re: [PATCH v3] redir: Handle nested exec within REALLY_CLOSED redirection

2019-01-17 Thread Martijn Dekker
Op 16-01-19 om 14:32 schreef Herbert Xu:
> So I'm going to go for a more complicated fix:

The v3 patch introduces a bug:

begin test script
init() {
exec 8&-

{
init
: <&8 && echo ok
} 8<&-
end test script

Actual output:
test2.sh: 9: test2.sh: 8: Bad file descriptor

Expected output:
ok

The 'read' gets a 'bad file descriptor', so the effect of the 'exec'
from the init function is lost.

Interestingly, removing either the ">&-" from the function definition
block, or the "8<&-" from the other braces block, or both, makes the
error go away.

- Martijn


Re: [PATCH v3] redir: Handle nested exec within REALLY_CLOSED redirection

2019-01-16 Thread Martijn Dekker
Op 16-01-19 om 22:39 schreef Martijn Dekker:
> Op 16-01-19 om 14:32 schreef Herbert Xu:
>> Thanks for the patch.  I took a deeper look into the history of
>> the bug and it turned out that I added REALLY_CLOSED as an
>> optimisation in order to avoid an unnecessary close(2) syscall.
> 
> Does this actually save cycles? I'm probably missing something, but that
> code looks to me like it probably uses more cycles than close(2) going
> 'descriptor not open, return'.

Never mind, stupid question that I should have googled before asking it.
The answer is that a switch to kernel mode and back is expensive.

- M.


Re: [PATCH v3] redir: Handle nested exec within REALLY_CLOSED redirection

2019-01-16 Thread Martijn Dekker
Op 16-01-19 om 14:32 schreef Herbert Xu:
> Thanks for the patch.  I took a deeper look into the history of
> the bug and it turned out that I added REALLY_CLOSED as an
> optimisation in order to avoid an unnecessary close(2) syscall.

Does this actually save cycles? I'm probably missing something, but that
code looks to me like it probably uses more cycles than close(2) going
'descriptor not open, return'.

- Martijn


Re: [BUG] dash hangs on 'eval' syntax error in dot script

2019-01-10 Thread Martijn Dekker
Op 10-01-19 om 11:37 schreef Martijn Dekker:
> In a dot script sourced with 'command .' (which is useful to avoid
> exiting if the script doesn't exist), triggering a syntax error in an
> 'eval' in a subshell causes dash to hang at the end of the main script.

In fact, 'eval' doesn't appear related. I can also trigger the bug by
triggering an error in another special builtin:

command . /dev/stdin <

Re: [BUG] 'fc -s' infinite loop

2019-01-01 Thread Martijn Dekker

Op 01-01-19 om 20:02 schreef Harald van Dijk:

On 01/01/2019 18:59, Martijn Dekker wrote:

Op 01-01-19 om 17:24 schreef Harald van Dijk:
The immediate problem can be fixed either by dropping support for the 
non-standard and undocumented fc -s [...]


Actually, it's standard:


I didn't say fc -s was non-standard, I said fc -s first last was 
non-standard. POSIX does not allow specifying last in combination with -s.


Woops. Sorry about that overly hasty misinterpretation.

- M.


Re: [BUG] 'fc -s' infinite loop

2019-01-01 Thread Martijn Dekker

Op 01-01-19 om 17:24 schreef Harald van Dijk:
The immediate problem can be fixed either by dropping support for the 
non-standard and undocumented fc -s [...]


Actually, it's standard:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/fc.html
| -s
| Re-execute the command without invoking an editor.

- M.



[BUG] 'fc -s' infinite loop

2019-01-01 Thread Martijn Dekker
On dash and on gwsh, the 'fc -s' command, that re-executes a command 
from the history, executes the command repeatedly in an infinite loop. 
It should execute it just once.


$ dash
$ echo foo
foo
$ fc -l
1 echo foo
$ fc -s 1
echo foo
foo
echo foo
foo
echo foo
foo
echo foo
foo
echo foo
foo
echo foo
foo
[...]

Repeatable as of dash-0.5.7 (on 0.5.6.1 and early it's also broken but 
in a different way).


- M.


[PATCH v2] Create block-local FD state when appending redirection closing the FD

2018-12-13 Thread Martijn Dekker

[Corrected patch due to an oops in first take. Apologies.]

Op 19-09-18 om 05:25 schreef Herbert Xu:

Harald van Dijk  wrote:

On 23/04/2018 19:56, Martijn Dekker wrote:

$ dash -c '{ exec 8

What surprises me most is that dash has code written specifically to
keep the fd closed. dash would be smaller and simpler if it behaved the
way you expected and the way most other shells behave: just remove all
traces of REALLY_CLOSED.


Probably a lingering case of bug-compatibility with the original Bourne 
shell, which behaves this way (confirmed on a VM with 1988 Xenix).



So did anything happen on the bash front? I'm happy to change if
bash moves in the same direction.


Yes. According to the bash changelog, Chet fixed it in git last 30th of 
April, meaning it'll be in bash 5.0.


Patch attached, as per Harald's suggestion.

- Martijn

diff --git a/src/redir.c b/src/redir.c
index e67cc0a..1e3feac 100644
--- a/src/redir.c
+++ b/src/redir.c
@@ -57,7 +57,6 @@
 #include "error.h"
 
 
-#define REALLY_CLOSED -3   /* fd that was closed and still is */
 #define EMPTY -2   /* marks an unused slot in redirtab */
 #define CLOSED -1  /* fd opened for redir needs to be closed */
 
@@ -136,10 +135,6 @@ redirect(union node *redir, int flags)
}
}
 
-   if (i == newfd)
-   /* Can only happen if i == newfd == CLOSED */
-   i = REALLY_CLOSED;
-
*p = i;
}
 
@@ -352,7 +347,6 @@ popredir(int drop)
close(i);
break;
case EMPTY:
-   case REALLY_CLOSED:
break;
default:
if (!drop)


Re: [BUG] failure to push/restore closed file descriptor

2018-12-13 Thread Martijn Dekker

Op 19-09-18 om 05:25 schreef Herbert Xu:

Harald van Dijk  wrote:

On 23/04/2018 19:56, Martijn Dekker wrote:

$ dash -c '{ exec 8

What surprises me most is that dash has code written specifically to
keep the fd closed. dash would be smaller and simpler if it behaved the
way you expected and the way most other shells behave: just remove all
traces of REALLY_CLOSED.


Probably a lingering case of bug-compatibility with the original Bourne 
shell, which behaves this way (confirmed on a VM with 1988 Xenix).



So did anything happen on the bash front? I'm happy to change if
bash moves in the same direction.


Yes. According to the bash changelog, Chet fixed it in git last 30th of 
April, meaning it'll be in bash 5.0.


Patch attached, as per Harald's suggestion.

- Martijn
diff --git a/src/redir.c b/src/redir.c
index e67cc0a..03b43c8 100644
--- a/src/redir.c
+++ b/src/redir.c
@@ -57,7 +57,6 @@
 #include "error.h"
 
 
-#define REALLY_CLOSED -3   /* fd that was closed and still is */
 #define EMPTY -2   /* marks an unused slot in redirtab */
 #define CLOSED -1  /* fd opened for redir needs to be closed */
 
@@ -136,10 +135,6 @@ redirect(union node *redir, int flags)
}
}
 
-   if (i == newfd)
-   /* Can only happen if i == newfd == CLOSED */
-   i = REALLY_CLOSED;
-
*p = i;
}
 
@@ -352,8 +347,6 @@ popredir(int drop)
close(i);
break;
case EMPTY:
-   case REALLY_CLOSED:
-   break;
default:
if (!drop)
dup2(rp->renamed[i], i);


Re: [PATCH] clear_traps: reset savestatus

2018-11-29 Thread Martijn Dekker

Op 29-11-18 om 21:39 schreef Harald van Dijk:
By the way, my change has an unintended but possibly acceptable side 
effect:


   trap '(trap "echo exit" EXIT; :)' EXIT

This prints nothing with current dash, but prints "exit" with my change. 
It also prints "exit" in ksh, mksh, posh, and bosh.


IMHO, that's not a side effect, but evidence that the bug was properly 
fixed. A subshell is defined as a duplicate of the original shell 
process except with its traps reset[*], so setting a trap should work as 
normal within it.


It doesn't work in bash, yash or zsh, which I believe is a bug that I 
should report to them.


- M.

[*] "A subshell environment shall be created as a duplicate of the shell 
environment, except that signal traps that are not being ignored shall 
be set to the default action."

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_12
(I'm going to assume that "signal traps" here is meant to include the 
EXIT trap, because not clearing it for subshells would be insane.)


Re: [PATCH] clear_traps: reset savestatus

2018-11-29 Thread Martijn Dekker

Op 29-11-18 om 20:56 schreef Harald van Dijk:
That's part of it, but not the whole story. Herbert Xu's comment about 
exitshell() was right, that is still a problem. A testcase for this is


   trap '(true) || echo bug' EXIT


Yes. Thanks. I hadn't thought about that.

The test case above is not quite complete. It needs to make sure the 
exit status is non-zero before executing the trap:


$ src/dash -c "trap '(true) || echo bug' EXIT; false"
bug

- Martijn


[PATCH] clear_traps: reset savestatus

2018-11-29 Thread Martijn Dekker

Op 27-11-18 om 17:24 schreef Martijn Dekker:

Big bad bug: it appears that subshells always return status 0 in traps.


As posted elsewhere, looks like the problem is simply that savestatus 
("/* exit status of last command outside traps */") isn't reset to -1 
upon resetting traps when forking a subshell.


Reposting patch for Patchwork purposes:

diff --git a/src/trap.c b/src/trap.c
index ab0ecd4..7740955 100644
--- a/src/trap.c
+++ b/src/trap.c
@@ -168,6 +168,7 @@ clear_traps(void)
}
}
trapcnt = 0;
+   savestatus = -1;
INTON;
 }


Thanks,

- M.


Re: [BUG] exit status of subshells in traps is always 0

2018-11-29 Thread Martijn Dekker

Op 27-11-18 om 17:24 schreef Martijn Dekker:

Big bad bug: it appears that subshells always return status 0 in traps.

Bug found in dash 0.5.9 and later.


In fact, the bug exists on at least dash 0.5.6 and possibly earlier 
(earlier versions don't compile for me) if the exit occurs due to a 
failure in an assignment or special builtin.


$ cat BUG_TRAPSUB0.sh
(trap '(! :) && echo BUG1' EXIT)
(trap '(false) && echo BUG2' EXIT)
(trap 'readonly foo=bar; (foo=baz) && echo BUG3' EXIT)
(trap '(set -o bad@option) && echo BUG4' EXIT)
$ dash-0.5.6 BUG_TRAPSUB0.sh
BUG_TRAPSUB0.sh: 1: foo: is read only
BUG3
set: 1: Illegal option -o bad@option
BUG4
$ # [Same for 0.5.6.1, 0.5.7, 0.5.8]
$ dash-0.5.9 BUG_TRAPSUB0.sh
BUG1
BUG2
BUG_TRAPSUB0.sh: 1: BUG_TRAPSUB0.sh: foo: is read only
BUG3
BUG_TRAPSUB0.sh: 1: set: Illegal option -o bad@option
BUG4
$ # [Same for 0.5.9.1, 0.5.10, 0.5.10.1, 0.5.10.2]

Thanks,

- M.


Re: eval: Only restore exit status on exit/return

2018-11-29 Thread Martijn Dekker

Op 29-11-18 om 15:48 schreef Herbert Xu:

The problem is that in evalsubshell we end up in exitshell again
which restores the old exit status.  So we need to come up with
a way to differentiate the exitshell from the original shell vs.
a subshell.


Isn't it much simpler than that? Upon forking a subshell, traps are 
reset, so it would make sense that any flag that says "we are currently 
in the process of executing a trap" is also reset.


Which, from a look at the source code, seems to be the -1 value of 
savestatus. So this oneliner fixes it for me:


diff --git a/src/trap.c b/src/trap.c
index ab0ecd4..7740955 100644
--- a/src/trap.c
+++ b/src/trap.c
@@ -168,6 +168,7 @@ clear_traps(void)
}
}
trapcnt = 0;
+   savestatus = -1;
INTON;
 }


Thanks,

- M.


Re: eval: Only restore exit status on exit/return

2018-11-29 Thread Martijn Dekker

Op 29-11-18 om 12:33 schreef Herbert Xu:

Thanks for the report.  This patch should fix the problem:


Doesn't work for me, in fact it seems to make no difference.

Here are a few more test cases.

(trap '(! :) && echo BUG1' EXIT)
(trap '(false) && echo BUG2' EXIT)
(trap 'readonly foo=bar; (foo=baz) && echo BUG3' EXIT)
(trap '(set -o bad@option) && echo BUG4' EXIT)

- M.


[BUG] exit status of subshells in traps is always 0

2018-11-27 Thread Martijn Dekker

Big bad bug: it appears that subshells always return status 0 in traps.

Bug found in dash 0.5.9 and later.

$ src/dash -c 'trap "(false) && echo BUG" INT; kill -s INT $$'
BUG
$ src/dash -c 'trap "(false) && echo BUG" EXIT'
BUG
$ src/dash -c 'trap "(false); echo \$?" EXIT'
0

Workaround: if an explicit 'exit' is given, it works as expected.

$ src/dash -c 'trap "(false; exit \$?); echo \$?" EXIT'
1

- Martijn



[BUG] ${#v} aborts string processing

2018-11-25 Thread Martijn Dekker

Another bug introduced by 3cd5386, and not fixed in current git.
String length expansion ${#v} aborts string processing.

$ ./dash -c 'v=abc; echo ab${#v}cd'
ab3

Expected output: ab3cd

Yes, you really need that regression test suite. I can help you start 
one off. Would you be willing to consider a patch to that effect?


- M.


Re: [PATCH v2] expand: Fix multiple issues with EXP_DISCARD in evalvar

2018-11-14 Thread Martijn Dekker
I encountered another bug, introduced by 3cd5386 and not fixed by v2 of 
this patch: the presence of a length-counting expansion like ${#foo} in 
a string causes the rest of the string to be discarded.


$ src/dash -c 'foo=bar; echo "baz ${#foo} quux"'
baz 3
$ src/dash -c 'foo=bar; echo baz\ ${#foo}\ quux'
baz 3

(expected outout: baz 3 quux)

- Martijn


[PATCH] update .gitignore

2018-09-06 Thread Martijn Dekker

Ignore .deps and .dirstamp in all directories.

diff --git a/.gitignore b/.gitignore
index 579bd47..e349901 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,11 +13,12 @@ Makefile.in

 # generated by configure
 Makefile
+.deps
+.dirstamp
 /config.cache
 /config.h
 /config.log
 /config.status
-/src/.deps/
 /stamp-h1

 # generated by make



Re: ${var+set}, ${var:+nonempty} broken in current git

2018-09-06 Thread Martijn Dekker

Op 06-09-18 om 07:07 schreef Herbert Xu:

On Wed, Sep 05, 2018 at 06:21:36PM +0200, Martijn Dekker wrote:

With this patch applied, the following breakage still occurs:
$ src/dash -u -c 'unset foo bar; echo ${foo+${bar}}'
src/dash: 1: bar: parameter not set
(expected: empty line, no error)
...which seems to suggest that ${bar} is evaluated even though foo is unset.

This patch should fix it.


Yes, confirmed.

But now I'm encountering another, similar bug, that was a bit harder to 
track down:


$ src/dash -c 'foo=bar; echo ${foo=BUG}; echo $foo'
barBUG
bar
$ src/dash -c 'foo=bar; echo ${foo:=BUG}; echo $foo'
barBUG
bar

Expected output: 'bar' twice in both cases. The ${foo=BUG} and 
${foo:=BUG} expansions fail to discard the word 'BUG' if foo is set.


- Martijn


${var+set}, ${var:+nonempty} broken in current git

2018-09-02 Thread Martijn Dekker
Commit 3cd538634f71538370f5af239f342aec48b7470b broke these:

$ src/dash -c 'unset var; echo ${var+set}'
set
$ src/dash -c 'var=; echo ${var:+nonempty}'
nonempty

HTH,

- M.


Re: Command substitution in here-documents

2018-08-13 Thread Martijn Dekker
Op 13-08-18 om 18:30 schreef Ron Yorston:
> Simon Ser wrote:
>> On August 13, 2018 4:22 PM, Martijn Dekker  wrote:
>>> -   The latest release is 0.5.10.2. I can't reproduce the bug at all in
>>> 0.5.10, 0.5.10.1 or 0.5.10.2.
>>
>> This is interesting. I can reproduce with the latest commit in master.
> 
> I'm with Simon, the bug is present in current master for me.

Yes, interesting.

> This is on x86_64 Linux.  Which platform are you using Martijn?

I compile my test shells on x86_64 Mac OS X 10.11.6.
Compiler: Apple LLVM version 8.0.0 (clang-800.0.42.1)

Tried it with current master, can't reproduce it there either.

- M.


Re: Command substitution in here-documents

2018-08-13 Thread Martijn Dekker
Op 11-08-18 om 12:08 schreef Ron Yorston:
> - Fixing this (other than using the blunt instrument of reverting the
>   faulty commit) is beyond my pay grade.  Someone with a better
>   understanding of the code will need to take a look.

My observations:

- The latest release is 0.5.10.2. I can't reproduce the bug at all in
0.5.10, 0.5.10.1 or 0.5.10.2.

- I can only reproduce the bug on dash 0.5.9 and 0.5.9.1 in interactive
mode, inputting the commands manually. I cannot reproduce it when
running a script (including a dot script sourced using a '.' command in
interactive mode).

- The error message produced can be ... interesting, e.g.:
$ dash-0.5.9.1
$ while read -r f; do echo "$f"; done < `echo hi`
> EOF
dash-0.5.9.1: 1: ad -r f; do echo "$f"; done <

Re: [BUG] failure to push/restore closed file descriptor

2018-04-27 Thread Martijn Dekker

Op 27-04-18 om 17:51 schreef Herbert Xu:

On Fri, Apr 27, 2018 at 05:47:27PM +0200, Martijn Dekker wrote:


No, because step 1 doesn't merely close fd 8. It enters a curly braces block
(a compound command) that locally closes fd 8 using a redirection, just like
any other redirection would be local to that compound command. Thus,
restoring the fd state when leaving that block must undo the effect of the
'exec'.

Note that dash already does this correctly if the '8<&-' is replaced by any
other redirection such as '8>/dev/tty'.


That's different as 8 was previously closed and is now open.


POSIX 2.7 Redirection says: "Redirection is used to open and close files for
the current shell execution environment [...] or for any command". Note that
"any command" includes compound commands such as curly braces blocks.


Well, I don't see anything in POSIX that requires us to close fd 8.
Can you please point it out to me please?


I will ask more authoritative people on the Austin Group for 
clarification and get back to you.


Even if it turns out it's not "required", though, I would think the 
current behaviour breaks obvious expectations.


To guarantee the expected behaviour of pushing that fd onto the shell's 
internal stack for restoring when leaving the compound command, you'd 
need to push it twice in two different states in two nested compound 
command blocks, for instance:


  { { exec 8/dev/null

The need for that workaround seems like evidence of a bug to me.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [v5] quote arguments in xtrace output

2018-04-19 Thread Martijn Dekker

Op 19-04-18 om 12:14 schreef Herbert Xu:

Sorry, but one of the key goals of dash is to be as small as
possible.  So that means no features unless absolutely necessary.

As such I cannot accept this patch as it is.


Would you accept it if it were a configure option, disabled by default 
(like linking with libedit)?


- Martijn

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash.1 - correct typos, iff -> if

2018-03-29 Thread Martijn Dekker
Op 27-03-18 om 20:23 schreef Larry Hynes:
> Funny, I did wonder if it might be a contraction, but I did find
> it odd that it's not mentioned or explained. I'll leave it be, if
> you all are happy enough to keep it 'as is', or can resubmit if you
> think it's warranted.

I think the simple fact that it came up here is evidence that this is
too jargony for a manual. Patch attached.

- M.
diff --git a/src/dash.1 b/src/dash.1
index 1056285..c452c3f 100644
--- a/src/dash.1
+++ b/src/dash.1
@@ -606,11 +606,11 @@ and
 .Dq ||
 are AND-OR list operators.
 .Dq &&
-executes the first command, and then executes the second command iff the
-exit status of the first command is zero.
+executes the first command, and then executes the second command if and only
+if the exit status of the first command is zero.
 .Dq ||
-is similar, but executes the second command iff the exit status of the first
-command is nonzero.
+is similar, but executes the second command if and only if the exit status
+of the first command is nonzero.
 .Dq &&
 and
 .Dq ||


[PATCH] [v5] quote arguments in xtrace output

2018-03-29 Thread Martijn Dekker
The xtrace (set -x) output in dash is a bit of a pain, because arguments
containing whitespace aren't quoted. This can it impossible to tell
which bit belongs to which argument:

$ dash -c 'set -x; printf "%s\n" one "two three" four'
+ printf %s\n one two three four
one
two three
four

Another disadvantage is you cannot simply copy and paste the commands
from this xtrace output to repeat them for debugging purposes.

The attached patch shell-quotes xtrace arguments that contain
non-shell-safe characters or are identical to reserved words.

A parameter is added to single_quote() indicating whether quoting should
be unconditional (0) or conditional upon the string containing
characters that are not shell-safe (1). This leaves the unconditional
quoting in the output of commands like 'trap' and 'set' unchanged while
avoiding excessive quoting in xtrace output.

Shell-safe characters are defined as this set:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz%+,./:@_^-

Quoting of assignments needs to be handled specially; in order for the
output to be suitable for re-entry into the shell, only the part after
the "=" must be quoted. For this purpose, eprintlist() was split in into
two functions, eprintvarlist() to print the variable assignments and
eprintarglist() to print the rest.

I've had this patch in use for a year and found it to work reliably, so
I hope it can be considered. The only differences with v4, sent on 18
March 2017, is that I fixed a line with trailing whitespace, and that it
should now be good for Patchwork.

Thanks,

- M.
diff --git a/src/alias.c b/src/alias.c
index daeacbb..7a88b44 100644
--- a/src/alias.c
+++ b/src/alias.c
@@ -197,7 +197,7 @@ freealias(struct alias *ap) {
 
 void
 printalias(const struct alias *ap) {
-   out1fmt("%s=%s\n", ap->name, single_quote(ap->val));
+   out1fmt("%s=%s\n", ap->name, single_quote(ap->val, 0));
 }
 
 STATIC struct alias **
diff --git a/src/eval.c b/src/eval.c
index 7498f9d..aba305e 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -95,7 +95,8 @@ STATIC int evalcommand(union node *, int);
 STATIC int evalbltin(const struct builtincmd *, int, char **, int);
 STATIC int evalfun(struct funcnode *, int, char **, int);
 STATIC void prehash(union node *);
-STATIC int eprintlist(struct output *, struct strlist *, int);
+STATIC int eprintvarlist(struct output *, struct strlist *, int);
+STATIC void eprintarglist(struct output *, struct strlist *, int);
 STATIC int bltincmd(int, char **);
 
 
@@ -786,8 +787,8 @@ evalcommand(union node *cmd, int flags)
out = 
outstr(expandstr(ps4val()), out);
sep = 0;
-   sep = eprintlist(out, varlist.list, sep);
-   eprintlist(out, arglist.list, sep);
+   sep = eprintvarlist(out, varlist.list, sep);
+   eprintarglist(out, arglist.list, sep);
outcslow('\n', out);
 #ifdef FLUSHERR
flushout(out);
@@ -1107,16 +1108,35 @@ execcmd(int argc, char **argv)
 
 
 STATIC int
-eprintlist(struct output *out, struct strlist *sp, int sep)
+eprintvarlist(struct output *out, struct strlist *sp, int sep)
 {
while (sp) {
const char *p;
+   int i;
 
-   p = " %s" + (1 - sep);
+   if (sep)
+   outfmt(out, " ");
sep |= 1;
-   outfmt(out, p, sp->text);
+   i = 0;
+   while (sp->text[i] != '=' && sp->text[i] != '\0')
+   outfmt(out, "%c", sp->text[i++]);
+   if (sp->text[i] == '=')
+   outfmt(out, "=%s", single_quote(sp->text+i+1, 1));
sp = sp->next;
}
 
return sep;
 }
+
+STATIC void
+eprintarglist(struct output *out, struct strlist *sp, int sep)
+{
+   while (sp) {
+   const char *p;
+
+   p = " %s" + (1 - sep);
+   sep |= 1;
+   outfmt(out, p, single_quote(sp->text, 1));
+   sp = sp->next;
+   }
+}
diff --git a/src/mystring.c b/src/mystring.c
index de624b8..e13b1a6 100644
--- a/src/mystring.c
+++ b/src/mystring.c
@@ -55,6 +55,7 @@
 #include "memalloc.h"
 #include "parser.h"
 #include "system.h"
+#include "token_vars.h"
 
 
 char nullstr[1];   /* zero length string */
@@ -184,13 +185,20 @@ is_number(const char *p)
 
 /*
  * Produce a possibly single quoted string suitable as input to the shell.
- * The return string is allocated on the stack.
+ * If 'conditional' is nonzero, quoting is only done if the string contains
+ * non-shellsafe characters, or is identical to a shell keyword (reserved
+ * word); if it is zero, quoting is always done.
+ * If quoting was done, the return string is allocated on the stack,
+ * otherwise a pointer to the original string is returned.
  */
 
 char *
-single_quote(const char *s) {
+single_quote(const char *s, int conditional) {
char *p;
 
+   if (conditional && *s != '\0' && 

Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Martijn Dekker
Op 26-03-18 om 14:12 schreef Harald van Dijk:
> On 26/03/2018 13:57, Martijn Dekker wrote:
>> I don't see any inconsistency. Expansions are consistently treated
>> differently within 'case' than outside it. Among other things,
>> expansions within 'case' are *not* subject to pathname expansion; it's
>> string pattern matching using glob patterns, which is something
>> completely different.
> 
> It's not something completely different. Pathname expansion is defined
> in terms of pattern matching (the pattern matching used in e.g. case
> statements), plus a specific set of differences. See 2.6.6 Pathname
> Expansion:
> 
>> After field splitting, if set -f is not in effect, each field in the
>> resulting command line shall be expanded using the algorithm described
>> in Pattern Matching Notation, qualified by the rules in Patterns Used
>> for Filename Expansion.
> 
> That specific set of differences, 2.13.3 Patterns Used for Filename
> Expansion, doesn't include different treatment of backslashes.

I see your point now. You're absolutely right.

Hmmm...

If we backslash-escape a glob character, '?':

$ touch '_foo?bar_'
$ testshells -c 'p='\''*o\?b*'\''; printf %s $p'
The backslash is correctly honoured by:
bash 2.05b through git: _foo?bar_
dash 0.5.5.1: _foo?bar_
dash-hvdijk: _foo?bar_
zsh as sh: _foo?bar_
The backslash is *not* honoured by:
dash 0.5.6 through 0.5.9.1: *o\?b*
ksh93: *o\?b*
mksh/lksh: *o\?b*
yash -o posix: *o\?b*

And if we backslash-escape a non-glob character, 'b':

$ touch '_foo?bar_'
$ testshells -c 'p='\''*o?\b*'\''; printf %s $p'
The backslash is correctly honoured by:
bash 2.05b through git: _foo?bar_
dash 0.5.5.1: _foo?bar_
dash-hvdijk: _foo?bar_
The backslash is *not* honoured by:
dash 0.5.6 through 0.5.9.1: *o\?b*
ksh93: *o\?b*
mksh/lksh: *o\?b*
yash -o posix: *o\?b*
zsh as sh: *o\?b*

Funny how these results are different from the results I get when doing
the same test with 'case' pattern matching. As you point out, they are
supposed to be subject to the same rules with some modifications *not*
including backslash parsing. So the results should at least be identical
for each shell.

So yes, dash is inconsistent. But given what POSIX says, I think dash
should probably go back to honouring the backslash for pathname
expansion as it did in 0.5.5.1 and does in your fork.

Maybe you should argue the case with the Austin Group. It would be nice
to get clarification on the issue.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Martijn Dekker
Op 26-03-18 om 12:30 schreef Harald van Dijk:
> With the snipping it's not clear that I was specifically confused by the
> inconsistency.
> 
> I had included another example:
> 
>   pat="/de\v"
>   printf "%s\n" $pat
> 
> I can understand treating backslash as quoted, or treating it as
> unquoted, but not quoted-unless-in-a-case-statement. What justifies this
> one exception?

I don't see any inconsistency. Expansions are consistently treated
differently within 'case' than outside it. Among other things,
expansions within 'case' are *not* subject to pathname expansion; it's
string pattern matching using glob patterns, which is something
completely different.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Martijn Dekker
Op 25-03-18 om 22:56 schreef Harald van Dijk:
>   case /dev in $pat) echo why ;; esac
> 
> Now, bash and dash say that the pattern does match -- they take the
> backslash as unquoted, allowing it to escape the v. Most other shells
> (bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as
> quoted.
> 
> This doesn't make sense to me, and doesn't match historic practice:
[...]

POSIX says:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05
| In order from the beginning to the end of the case statement, each
| pattern that labels a compound-list shall be subjected to tilde
| expansion, parameter expansion, command substitution, and arithmetic
| expansion, and the result of these expansions shall be compared
| against the expansion of word, according to the rules described in
| Pattern Matching Notation (which also describes the effect of quoting
| parts of the pattern).

The way I read this, this clearly says that quoting in a pattern
(particularly backslash quoting, which is the only kind specified in
"Pattern Matching Notation") still needs to have the usual effect even
if the pattern results from one or more expansions. But I understand
there are differences of opinion about this. 

It's certainly true that few shells actually act this way, but dash is
one that does, as is Busybox ash -- and so is bash (for the most part;
see further on).

I think *not* acting this way is illogical. Why should 'case' parse glob
characters resulting from expansions, but not the backslashes that could
quote those glob characters? I can see no reason for that.

Note that quoting the expansion, as in
case /dev in "$pat") echo why ;; esac
does what you would expect: the pattern resulting from the expansion is
fully quoted. So with dash and bash you can easily and cleanly have it
either way, unlike with other shells.

(Note that yash, ksh93 and zsh-as-sh act half-baked: backslashes in
patterns resulting from expansions are accepted to quote glob characters
and backslashes themselves, but not any other character. AFAICT, that
behaviour doesn't conform to POSIX no matter which way you slice it.)

[...]
> or are there scenarios where it's important to treat an expanded
> backslash as unquoted?

Consider this function from modernish (simplified version):

match() {
case $1 in
( $2 ) ;;
( * ) return 1 ;;
esac
}

This allows doing:

if match STRING GLOBPATTERN; then

on every POSIX shell. Very convenient. Easier than 'case', especially if
you want to combine it like: command1 && match foo bar && command3, etc.
And the syntax is not an eyesore, finger-twister and spacing pitfall,
unlike that of '[['.

But consider this:

match 'a\bcd' 'a\?c*'

The '?' is escaped so shouldn't match. This correctly returns a negative
on dash, bash, ksh93, and zsh. It returns a false positive on yash and
mksh. (I haven't tested other shells like FreeBSD sh lately.) This means
on those shells you can't use a backslash to escape a glob character in
a pattern passed as a parameter.

And how about this:

match 'a\bcd' 'a\\bcd'

Same pattern as above. This correctly returns a positive on dash, bash,
ksh93, and zsh-as-sh; a false negative on the rest.

However, this:

match '? *xy' '??\*\x\y'

only correctly return a positive on bash and dash. That's because ksh93
and zsh-as-sh, for patterns resulting from expansions, only parse
backslash quoting for glob characters and the backslash itself, but not
for other characters.

On bash, there is a bug that breaks backslash quoting on match() if the
pattern contains a ^A (\001). So bash can't robustly use the simple
match() for arbitrary patterns. This is *mostly* fixed in the
development version; the fix is good enough for the simple match() to work.

Bottom line is, dash and Busybox ash (but not FreeBSD sh), as well as
the upcoming release version of bash, are currently the only shells that
can reliably use the plain, simple and fast match() above for arbitrary
patterns.

When running on other shells (as determined by an init-time feature test
using a simple match()), modernish match() detects one or more
backslashes in the pattern, and if it finds any, quotes the pattern
except for glob characters and backslashes, so it can safely be
'eval'-ed as a literal pattern. This workaround is effective, but was a
bitch to get right and is not exactly a performance winner.

So yeah, I'd like to keep dash the way it is, please :)

- Martijn
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] don't record empty IFS scan regions if not debugging

2018-03-23 Thread Martijn Dekker
Op 23-03-18 om 15:34 schreef Martijn Dekker:
> Op 23-03-18 om 11:58 schreef Herbert Xu:
>> Thanks, the problem here is that we need to set c to 0 not just
>> when quoted is true but also if sep is 0 since both imply that
>> field splitting is disabled.  Here is an second revision which
>> should fix this by checking (quoted || !sep) instead of just
>> quoted when determining whether we're doing field expansion in $*.
> 
> FWIW, this passes all my tests.

But dash is still recording empty IFS scan regions with nothing for IFS
to split. That still seems strange to me.

It's true that this behaviour has helped expose and fix some bugs in
this instance. However, by the same token, other (future) bugs might be
exposed by avoiding illogical behaviour.

So how about having it both ways? If DEBUG is defined, record empty
split regions. If not, don't waste cycles doing so.

- M.
diff --git a/src/expand.c b/src/expand.c
index c14350c..d6c7946 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -774,7 +774,12 @@ record:
if (!quoted)
goto end;
}
-   recordregion(startloc, expdest - (char *)stackblock(), quoted);
+#ifndef DEBUG
+   if (varlen > 0)
+#endif
+   recordregion(startloc,
+expdest - (char *)stackblock(),
+quoted);
goto end;
}
 


Re: expand: Fix ghost fields with unquoted $@/$*

2018-03-23 Thread Martijn Dekker
Op 23-03-18 om 05:37 schreef Herbert Xu:
> You're right.  The proper fix to this is to ensure that nulonly
> is not set in varvalue for $*.  It should only be set for $@ when
> it's inside double quotes.
> 
> In fact there is another bug while we're playing with $@/$*.
> When IFS is set to a non-whitespace character such as :, $*
> outside quotes won't remove empty fields as it should.
> 
> This patch fixes both problems.

Unfortunately it also introduces a bug with $*.

$ src/dash -c 'IFS=:; v=$*; printf "[%s]\n" "$v"' _ abc 'def ghi' jkl
[abcdef ghijkl]

Expected:
[abc:def ghi:jkl]


$ src/dash -fc 'IFS=:; set ${v=$*}; printf [%s]\\n "$v" "$@"' \
_ abc 'def ghi' jkl
[abcdef ghijkl]
[abcdef ghijkl]

Expected:

[abc:def ghi:jkl]
[abc]
[def ghi]
[jkl]


$ src/dash -fc 'IFS=:; set "${v=$*}"; printf [%s]\\n "$v" "$@"' \
_ "abc" "def ghi" "jkl"
[abcdef ghijkl]
[abcdef ghijkl]

Expected:

[abc:def ghi:jkl]
[abc:def ghi:jkl]


Thanks,

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't record empty IFS scan regions

2018-03-22 Thread Martijn Dekker
Op 22-03-18 om 23:16 schreef Harald van Dijk:
> Passing quoted by value to varvalue() and not attempting to modify it
> should therefore, and in my quick testing does, also work to fix the
> original $@ bug.

Then maybe both that and not recording empty split regions should be done.

My C-fu may not be all that strong but I think my reasoning for my patch
makes straightforward logical sense based on the comment description in
expand.c of what recordregion() does:
/*
 * Record the fact that we have to scan this region of the
 * string for IFS characters.
 */
This is pointless if there is nothing to scan.

Thus, making both changes should fix the bug while implementing a slight
optimisation and increasing code integrity.

(Speaking of code integrity, IMO that mess of gotos is in dire need of
refactoring. But I'm not the person to do that.)

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't record empty IFS scan regions

2018-03-22 Thread Martijn Dekker
Op 22-03-18 om 20:28 schreef Harald van Dijk:
> On 22/03/2018 03:40, Martijn Dekker wrote:
>> evalvar() records empty expansion results (varlen == 0) as string
>> regions that need to be scanned for IFS characters. This is pointless,
>> because there is nothing to split.
> 
> varlen may be set to -1 when an unset variable is expanded. If it's
> beneficial to avoid recording empty regions to scan for IFS characters,
> should those also be excluded?

Agreed. Updated patch attached.

>> This patch fixes the bug that, given no positional parameters, unquoted
>> $@ and $* incorrectly generate one empty field (they should generate no
>> fields). Apparently that was a side effect of the above.
> 
> This seems weird though. If you want to remove the recording of empty
> regions because they are pointless, then how does removing them fix a
> bug? Doesn't this show that empty regions do have an effect? Perhaps
> they're not supposed to have any effect, perhaps it's a specific
> combination of empty regions and something else that triggers some bug,
> and perhaps that combination can no longer occur with your patch.

The latter is my guess, but I haven't had time to investigate it.

I *did* investigate whether the patch introduces other bugs, and haven't
found any new ones.

I successfully ran this large IFS and PPs test script from AT against
dash with this patch:

http://web.archive.org/web/20050414022354/http://www.research.att.com/~gsf/public/ifs.sh

My own modernish regression test suite, which tests some pretty crazy
things including lots of IFS and positional parameters stuff, also
passes completely:

git clone https://github.com/modernish/modernish
cd modernish
dash bin/modernish --test
(still under development, may occasionally break)

- M.
diff --git a/src/expand.c b/src/expand.c
index 705fef7..c3d20a4 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -771,7 +771,7 @@ vsplus:
 
if (subtype == VSNORMAL) {
 record:
-   if (!easy)
+   if (!easy || varlen <= 0)
goto end;
recordregion(startloc, expdest - (char *)stackblock(), quoted);
goto end;


Re: RFC/Feature request: UTF-8 support

2018-03-22 Thread Martijn Dekker
Op 22-03-18 om 20:12 schreef Harald van Dijk:
> Isn't all of busybox, including ash, GPL? Wouldn't that mean that if any
> busybox code is imported into dash, that from then on dash can only be
> distributed under GPL?

I thought dash was already effectively under the GPL due to including
the output of mksignames.c.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC/Feature request: UTF-8 support

2018-03-22 Thread Martijn Dekker
It's 2018 and UTF-8 is the de-facto standard character set for the
shell. IMO, supporting it should no longer be considered optional.

Busybox ash mostly tracks dash in applying patches and such, but it does
a good job of supporting UTF-8. The licenses are bidirectionally
compatible. Perhaps UTF-8 support could be ported over in the other
direction? Maybe Denys Vlanesko could assist.

Opinions?

Thanks,

- Martijn
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [v2] don't record empty IFS scan regions

2018-03-22 Thread Martijn Dekker
evalvar() records empty expansion results (varlen == 0) as string
regions that need to be scanned for IFS characters. This is pointless,
because there is nothing to split.

This patch fixes the following bug, which is apparently a side effect of
the above:

$ dash -c 'IFS=; set --
set -- $@ $*; printf $#,
set -- $@ $*; printf $#,
set -- $@ $*; echo $#'
2,4,8

Expected output: 0,0,0. Given set & empty IFS and no positional
parameters, unquoted $@ and $* incorrectly generate one empty field
(they should generate no fields).

- M.
diff --git a/src/expand.c b/src/expand.c
index 705fef7..03a9b0c 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -771,7 +771,7 @@ vsplus:
 
if (subtype == VSNORMAL) {
 record:
-   if (!easy)
+   if (!easy || varlen == 0)
goto end;
recordregion(startloc, expdest - (char *)stackblock(), quoted);
goto end;


Re: [PATCH] don't record empty IFS scan regions

2018-03-22 Thread Martijn Dekker
Op 22-03-18 om 10:35 schreef Herbert Xu:
> Martijn Dekker <mart...@inlv.org> wrote:
>>
>> evalvar() records empty expansion results (varlen == 0) as string
>> regions that need to be scanned for IFS characters. This is pointless,
>> because there is nothing to split.
>>
>> This patch fixes the bug that, given no positional parameters, unquoted
>> $@ and $* incorrectly generate one empty field (they should generate no
>> fields). Apparently that was a side effect of the above.
> 
> Please attach a reproducer as I cannot reproduce this problem.

Sorry, I forgot to mention it only occurs with set & empty IFS.

It came up earlier in the thread about Harald van Dijk's patch for a
recursive parser.

$ dash-0.5.9.1 -c 'IFS=; set --; set -- $@ $*; echo $#'
2

(expected output: 0)

I'll resend the patch with this reproducer.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] provide .gitignore

2018-03-10 Thread Martijn Dekker
Here's a .gitignore file for the convenience of casual git users.

- M.

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..579bd47
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,41 @@
+# .gitignore for dash
+
+# generated by autogen.sh
+Makefile.in
+/aclocal.m4
+/autom4te.cache/
+/compile
+/config.h.in
+/configure
+/depcomp
+/install-sh
+/missing
+
+# generated by configure
+Makefile
+/config.cache
+/config.h
+/config.log
+/config.status
+/src/.deps/
+/stamp-h1
+
+# generated by make
+/src/token_vars.h
+
+# Apple debug symbol bundles
+*.dSYM/
+
+# backups and patch artefacts
+*~
+*.bak
+*.orig
+*.rej
+
+# OS generated files
+.DS_Store
+.DS_Store?
+._*
+.Spotlight*
+.Trash*
+*[Tt]humbs.db
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Martijn Dekker
Op 07-03-18 om 16:29 schreef Herbert Xu:
> I also didn't quite like the idea of scanning the string backwards
> to find the previous syntax.  So here is my attempt at the recursive
> parsing using alloca. 

This version introduces a parsing bug:

$ src/dash -c 'x=0; x=$((${x}+1))'
src/dash: 1: Syntax error: Unterminated quoted string

It is triggered by the ${x} (with braces) within an arithmetic expression.

- M.

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Greater resolution in test -nt / test -ot

2018-03-07 Thread Martijn Dekker
Op 07-03-18 om 15:46 schreef Martijn Dekker:
> Op 06-03-18 om 09:19 schreef Herbert Xu:
>> On Thu, Jun 22, 2017 at 10:30:02AM +0200, Petr Skočík wrote:
>>> would you be willing to pull something like this?
> [...]
>>> I could use greater resolution in `test -nt` / `test -ot`, and st_mtim
>>> field is standardized under POSIX.1-2008 (or so stat(2) says).
>>
>> Sure.  But your patch is corrupted.
> 
> Fixed patch attached.
> 
> But I wouldn't apply it as is. My system does not have st_mtim. So I
> think it needs a configure test and a fallback to the old method.

Here's an attempt to make that happen. See attached.

- M.
diff --git a/configure.ac b/configure.ac
index 9c4ced8..0417fa2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -149,6 +149,20 @@ AC_CHECK_FUNC(open64,, [
AC_DEFINE(open64, open, [64-bit operations are the same as 32-bit])
 ])
 
+dnl Check if struct stat has st_mtim.
+AC_MSG_CHECKING(for stat::st_mtim)
+AC_COMPILE_IFELSE(
+[AC_LANG_PROGRAM([#include 
+#include 
+#include ],
+[struct stat foo; return sizeof(foo.st_mtim.tv_sec)])],
+have_st_mtim=yes, have_st_mtim=no)
+AC_MSG_RESULT($have_st_mtim)
+if test "$have_st_mtim" = "yes"; then
+   AC_DEFINE([HAVE_ST_MTIM], [1],
+   [Define if your `struct stat' has `st_mtim'])
+fi
+
 AC_ARG_WITH(libedit, AS_HELP_STRING(--with-libedit, [Compile with libedit 
support]))
 use_libedit=
 if test "$with_libedit" = "yes"; then
diff --git a/src/bltin/test.c b/src/bltin/test.c
index 58c05fe..d1458df 100644
--- a/src/bltin/test.c
+++ b/src/bltin/test.c
@@ -476,9 +476,17 @@ newerf (const char *f1, const char *f2)
 {
struct stat b1, b2;
 
+#ifdef HAVE_ST_MTIM
+   return (stat (f1, ) == 0 &&
+   stat (f2, ) == 0 &&
+   ( b1.st_mtim.tv_sec > b2.st_mtim.tv_sec ||
+(b1.st_mtim.tv_sec == b2.st_mtim.tv_sec && (b1.st_mtim.tv_nsec 
> b2.st_mtim.tv_nsec )))
+   );
+#else
return (stat (f1, ) == 0 &&
stat (f2, ) == 0 &&
b1.st_mtime > b2.st_mtime);
+#endif
 }
 
 static int
@@ -486,9 +494,17 @@ olderf (const char *f1, const char *f2)
 {
struct stat b1, b2;
 
+#ifdef HAVE_ST_MTIM
+   return (stat (f1, ) == 0 &&
+   stat (f2, ) == 0 &&
+   (b1.st_mtim.tv_sec < b2.st_mtim.tv_sec ||
+(b1.st_mtim.tv_sec == b2.st_mtim.tv_sec && (b1.st_mtim.tv_nsec 
< b2.st_mtim.tv_nsec )))
+   );
+#else
return (stat (f1, ) == 0 &&
stat (f2, ) == 0 &&
b1.st_mtime < b2.st_mtime);
+#endif
 }
 
 static int


Re: [PATCH] fix "Illegal number" on FreeBSD & macOS for x=; echo $((x))

2018-03-07 Thread Martijn Dekker
Op 07-03-18 om 06:26 schreef Herbert Xu:
> Martijn Dekker <mart...@inlv.org> wrote:
>>
>>> Since base is always a constant 0 or a constant 10, never a
>>> user-provided value, the only error that strtoimax will ever report on
>>> glibc systems is ERANGE. Checking only ERANGE therefore preserves the
>>> glibc behaviour, and allows the exact same set of errors to be detected
>>> on non-glibc systems.
>>
>> That makes sense, thanks.
> 
> Could you resend your patch with this change please?

OK, see below.

- M.

diff --git a/src/mystring.c b/src/mystring.c
index 0106bd2..de624b8 100644
--- a/src/mystring.c
+++ b/src/mystring.c
@@ -125,7 +125,7 @@ intmax_t atomax(const char *s, int base)
errno = 0;
r = strtoimax(s, , base);

-   if (errno != 0)
+   if (errno == ERANGE)
badnum(s);

/*

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Development branch?

2018-03-06 Thread Martijn Dekker
Hi,

There's been no activity on dash git since 23 Sep 2016, the 0.5.9.1
release date. Since then there have been too many patches to keep track
of, some of them fixing important bugs.

Herbert, could we have a development git branch that has these patches
applied (the ones you approve of)?

Thanks,

- Martijn
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2018-03-06 Thread Martijn Dekker
Op 06-03-18 om 08:45 schreef Herbert Xu:
> However, your patch also breaks here-
> document parsing when the delimiter is a single backslash.
> 
>   cat << "\"
>   \

That is supposed to break. "\" is not a correctly quoted backslash. Try
'\' or "\\" or \\

- M.

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix "Illegal number" on FreeBSD & macOS for x=; echo $((x))

2018-03-05 Thread Martijn Dekker
Op 06-03-18 om 00:23 schreef Harald van Dijk:
> On 3/6/18 12:23 AM, Martijn Dekker wrote:
>> "All variables shall be initialized to zero if they are not otherwise
>> assigned by the input to the application."
> 
> In your example, x has been assigned. Its value is empty, but empty and
> unset are two different things.
> 
> But I see now that dash gives the same error when x is unset. That part
> is definitely wrong for exactly the reason you point out.

By default, all unset variables are considered empty, so they should act
the same.

That brings me to another possible issue, though:

$ dash -uc 'unset x; echo $((x))'
0

Shouldn't that give an "parameter not set" error under set -u?
Interestingly, only bash and AT ksh93 currently do that.


>> Also, consensus among existing shells appears to be universal.
> 
> Mostly, but not fully. yash is the exception:
> 
>   $ yash -c 'unset x; echo $((x))'
>   0
>   $ yash -c 'x=; echo $((x))'
> 


But in POSIX mode it acts like other shells:

$ yash -o posix -c 'x=; echo $((x))'
0


> yash is pretty special when it comes to shell arithmetic:
> 
>   $ yash -c 'x=abc; echo $((x))'
>   abc
>   $ yash -c 'x=abc; echo $((x+0))'
>   yash: arithmetic: `abc' is not a valid number


$ yash -o posix -c 'x=abc; echo $((x))'
yash: arithmetic: `abc' is not a valid number


>>> But perhaps
>>>
>>>    if (errno == ERANGE)
>>>
>>> is all that's needed here.
>>
>> Explain?
> 
> Since base is always a constant 0 or a constant 10, never a
> user-provided value, the only error that strtoimax will ever report on
> glibc systems is ERANGE. Checking only ERANGE therefore preserves the
> glibc behaviour, and allows the exact same set of errors to be detected
> on non-glibc systems.


That makes sense, thanks.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix "Illegal number" on FreeBSD & macOS for x=; echo $((x))

2018-03-05 Thread Martijn Dekker
Op 05-03-18 om 22:41 schreef Harald van Dijk:
> On 3/5/18 1:32 AM, Martijn Dekker wrote:
>> dash compiled on Mac OS X (macOS) and FreeBSD manifests the following
>> bug:
>>
>> $ dash -c 'x=; echo $((x))'
>> dash: 1: Illegal number:
>>
>> This error is printed instead of the expected default "0" that should be
>> substituted for an empty variable in an arithmetic expression.
>>
>> The bug does not occur on Linux, NetBSD, OpenBSD or Solaris.
> 
> There is no reason why dash should behave differently on FreeBSD vs.
> Linux, so I agree with your patch, but isn't this non-standard, isn't
> either behaviour allowed?

I don't think so.

2.6.4 Arithmetic Expansion:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04
"The arithmetic expression shall be processed according to the rules
given in Arithmetic Precision and Operations [...]".

Arithmetic Precision and Operations:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_01_02_01
"All variables shall be initialized to zero if they are not otherwise
assigned by the input to the application."

Also, consensus among existing shells appears to be universal.

> This looks good, it does the job, but it can be simplified a bit:
> 
> If errno == EINVAL, then p == s is already known.
> 
> If errno != 0 && p == s, then errno == EINVAL is already known.
> 
> This allows simplifying to either of the following:
> 
>   if (errno != 0 && errno != EINVAL)
>   if (errno != 0 && p != s)

Good point.

> But perhaps
> 
>   if (errno == ERANGE)
> 
> is all that's needed here.

Explain?

Thanks,

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix "Illegal number" on FreeBSD & macOS for x=; echo $((x))

2018-03-04 Thread Martijn Dekker
dash compiled on Mac OS X (macOS) and FreeBSD manifests the following bug:

$ dash -c 'x=; echo $((x))'
dash: 1: Illegal number:

This error is printed instead of the expected default "0" that should be
substituted for an empty variable in an arithmetic expression.

The bug does not occur on Linux, NetBSD, OpenBSD or Solaris.

I traced the problem to the fact that strtoimax(3) on macOS and FreeBSD
returns EINVAL in errno for an empty string -- unlike those other
systems, which return 0 in errno for an empty string.

POSIX says of strtoimax(3):
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtoimax.html
| These functions may fail if:
|
| [EINVAL]
| No conversion could be performed.

It seems reasonable to consider that "no conversion could be performed"
if the string to convert is empty. Returning EINVAL if no conversion
could be performed is optional ("may fail"). So it seems to me that both
the FreeBSD/macOS behaviour and that of the other systems is POSIX
compliant.

The following patch should eliminate the bug on FreeBSD, macOS and any
other POSIX system which may act the same, without affecting behaviour
on other systems.

- M.

diff --git a/src/mystring.c b/src/mystring.c
index 0106bd2..c7df41f 100644
--- a/src/mystring.c
+++ b/src/mystring.c
@@ -125,7 +125,7 @@ intmax_t atomax(const char *s, int base)
errno = 0;
r = strtoimax(s, , base);

-   if (errno != 0)
+   if (errno != 0 && !(errno == EINVAL && p == s))
badnum(s);

/*
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Martijn Dekker
Op 04-03-18 om 16:46 schreef Harald van Dijk:
> FreeBSD sh also prints a blank line here.
[...]
> Like above, FreeBSD sh behaves like ksh.

I stand corrected.

Is there any port of FreeBSD sh to other operating systems? It would be
much more convenient for me to include it in my tests if I didn't have
to launch a FreeBSD VM and rsync & run the test scripts separately.


> Yes, the inconsistency should be fixed. Either it should be treated as
> quoted or as unquoted, but not quoted-unless-it-comes-from-a-variable. I
> have no strong feelings on which it should be.

Neither do I, so I would default to the behaviour that both pre-exists
in dash and corresponds with the majority of other shells.


>>> Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how
>>> "$@" got handled and reverting that changed it, I looked into how
>>> this works and fixed another bug. It also changes the handling of $*
>>> and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS
>>> properly at all, even if all parameters were non-empty. dash 0.5.9.1
>>> preserves empty parameters. With this patch, they get removed just
>>> like in bash. POSIX allows for either.
>> I don't think it does. POSIX implies that empty $@ and $* generates zero
>> fields. 2.5.2 Special Parameters:
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02
>>
>> | @
>> |    Expands to the positional parameters, starting from one, initially
>> |    producing one field for each positional parameter that is set.
>>
>> Since there are no positional parameters, no fields should initially be
>> produced at all. Same for $*.
>>
>> So I do believe your patch correctly fixes a bug here.
> 
> By empty parameters, I meant parameters that had been set to an empty
> string. It's covered by the 'set -- a ""' in my tests.

Ah yes, sorry for misreading. You're right that POSIX allows for either
removing or not removing empty fields generated by unquoted $@ and $*.
Note that the next iteration of POSIX will likely mandate their removal,
though. See the descripton of Austin Group bug 888:
http://austingroupbugs.net/view.php?id=888


>  You're right that
> after 'set --', unquoted $@ should not produce any fields. I hadn't even
> noticed that dash got that wrong and that my patch had fixed it.

:)


[...]
>> $ src/dash -c 'printf "%s\n" "${$+\}}"'
>> \}
>>
>> Expected output: }  (no backslash), as in bash 4, yash, ksh93, pdksh,
>> mksh, zsh. In other words: it should be possible to escape a '}' with a
>> backslash within the parameter expansion, even if the expansion is
>> quoted.
>>
>> POSIX ref.: 2.6.2 Parameter Expansion
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
>>
>> | Any '}' escaped by a  or within a quoted string, and
>> | characters in embedded arithmetic expansions, command substitutions,
>> | and variable expansions, shall not be examined in determining the
>> | matching '}'.
> 
> I believe this actually requires dash's behaviour. This says the first }
> isn't examined in determining the matching '}', but only that: it just
> says the parameter expansion expression is $+\}. It doesn't say the
> backslash is removed.

I believe the word "escaped" implies that removal. If a '}' is escaped
by a backslash, it's implied that the backslash is removed as this
escaping is parsed, just as it's implied that quotes are removed from a
quoted string.

> I agree that it would be much better to print } here though. 

All other current shells except bosh (schilytools sh) agree, too -- even
FreeBSD sh, and I checked it this time.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Martijn Dekker
Op 04-03-18 om 11:44 schreef Harald van Dijk:
> When CTLENDVAR is seen, I double-check that syntax has the expected
> value. This fixes the handling of "${$+"}"}", where the inner } was
> seen as ending the variable substitution.

Also looks like dash with your patch considers the "}" to be quoted:

$ src/dash -c 'IFS=}; printf %s\\n "${$+"}"}"'
}

Only AT ksh93 prints an empty string there, as it doesn't consider the
double quotes to be nested, so the "}" is unquoted. Ash produces a
syntax error, like dash before this patch. All other shells print a }.
So dash with this patch behaves like the majority.


> This fixes more cases than just backslashes and single quotes:
> another character that's special in unquoted contexts is ~, so
> "${HOME#~}" should expand to an empty string.

Yes. That was a bug and this patch fixes it.


> This changes how $(( ${$+"123"} )) gets handled: POSIX doesn't really
> answer this, I think. POSIX says that $(( "123" )) is a syntax error,
> but doesn't address whether " is special when it appears in other
> places than directly in the $((...)). Most shells accept $((
> ${$+"123"} )), and with this patch, dash accepts it too.

I've no strong feelings about this. The change doesn't seem like it
could be harmful.


> This changes how "${x+"$y"}" get handled: POSIX is silent about
> whether the $y should be treated as quoted. dash has treated it as
> quoted for a very long time. ash has historically treated it as
> unquoted. With this patch, it gets treated as unquoted.

That seems inconsistent with how it handles "${$+"}"}", in which the "}"
is treated as quoted (see above).

ksh93 is the only existing shell that treats the $y as unquoted, so I
think it would be better if dash continued to treat the $y as quoted.

Even if not, the inconsistency should be fixed.


> Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how
> "$@" got handled and reverting that changed it, I looked into how
> this works and fixed another bug. It also changes the handling of $*
> and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS
> properly at all, even if all parameters were non-empty. dash 0.5.9.1
> preserves empty parameters. With this patch, they get removed just
> like in bash. POSIX allows for either.
I don't think it does. POSIX implies that empty $@ and $* generates zero
fields. 2.5.2 Special Parameters:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02
| @
|Expands to the positional parameters, starting from one, initially
|producing one field for each positional parameter that is set.

Since there are no positional parameters, no fields should initially be
produced at all. Same for $*.

So I do believe your patch correctly fixes a bug here.


> I would be a bit surprised if the patch is acceptable in its current
> form, but it's worth seeing which of the current results are definitely
> correct, which of the results are acceptable, which results may well be
> unwanted, and which special cases I missed.

According to my tests (i.e. the modernish test suite), nearly everything
is POSIXly correct. There's only one parameter expansion problem left
that I can find, and it existed before this patch as well:

$ src/dash -c 'printf "%s\n" "${$+\}}"'
\}

Expected output: }  (no backslash), as in bash 4, yash, ksh93, pdksh,
mksh, zsh. In other words: it should be possible to escape a '}' with a
backslash within the parameter expansion, even if the expansion is quoted.

POSIX ref.: 2.6.2 Parameter Expansion
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
| Any '}' escaped by a  or within a quoted string, and
| characters in embedded arithmetic expansions, command substitutions,
| and variable expansions, shall not be examined in determining the
| matching '}'.


Thanks,

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-13 Thread Martijn Dekker
Op 13-02-18 om 14:53 schreef Denys Vlasenko:
> $ >'\'
> $ >'\'
> $ dash -c 'echo "\*"'
> \ \
> 
> The cause: uses "\\*" pattern instead of "\\\*".

Also:

$ dash -c 'case \\ab in "\*") echo BUG;; esac'
BUG
$ dash -c 'case \\a in "\?") echo BUG;; esac'
BUG

Yup. Full globbing within double quotes after a backslash.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] quoted substring parameter expansion ignores single-quotes in the arg

2017-10-25 Thread Martijn Dekker
Op 21-10-17 om 14:13 schreef Jilles Tjoelker:
[...]
> I think it is sufficiently clear that various special
> characters are active in ${param#word}, whether the outer substitution
> is within double-quotes or not.

Yes -- this came up on austin-group-l some time ago as well.
https://www.mail-archive.com/austin-group-l@opengroup.org/msg00197.html

> Although zsh is a good interactive shell, it does not follow
> POSIX as closely; not even in sh or ksh emulation mode.

I think that may have changed. Try the latest version. Over the last
year or two, many POSIX compliance bugs have been fixed. I believe the
latest version is about as compliant as bash or dash.

Zsh does still accept the wrong "${param#'}" expansion like dash does,
but handling non-compliant input is more like an extension than actual
non-compliance. Zsh acts correctly on "${param#\'}", like dash. It also
acts correctly on "${param#'foo'}", *unlike* dash.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] 'nolog' and 'debug' options cause "$-" to wreak havoc

2017-03-29 Thread Martijn Dekker
Bug: if either the 'nolog' or the 'debug' option is set, trying to
expand "$-" silently aborts parsing of an entire argument.

$ dash -o nolog -c 'set -fuC; echo "|$- are the options|"; \
set +o nolog; echo "|$- are the options|"'
|
|uCf are the options|
$ dash -o debug -c 'set -fuC; echo "|$- are the options|"; \
set +o debug; echo "|$- are the options|"'
|
|uCf are the options|

Also, though the 'nolog' option is POSIX[*] and (apart from this bug)
acts like POSIX, it's not documented the dash man page.

- Martijn

[*] "nolog: Prevent the entry of function definitions into the command
history; see Command History List."
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_25_03
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] quote arguments in xtrace output

2017-03-17 Thread Martijn Dekker
Op 28-02-17 om 04:39 schreef Martijn Dekker:
> Op 28-02-17 om 00:17 schreef Martijn Dekker:
>> Here is a version that does that, and removes '=' and '!' from the list
>> of shell-safe characters. This should fix all the issues you were
>> reporting, hopefully making the xtrace output completely safe for shell
>> re-entry. It introduces a new function is_kwd() that checks if a string
>> is identical to a shell keyword/reserved word.
>>
>> Attached are two patches: one incremental to my previous one, and one
>> against pristine dash 0.5.9.1.
> 
> There's a bug in my code. Empties need to be quoted, or they'll
> disappear. A simple check for the first character being null fixes it.
> 
> Once again, one incremental patch against v2, one against pristine dash
> 0.5.9.1.

My is_kwd() function was redundant; findkwd() already exists. Take
four... and sorry for the noise.

- M.

diff -ur dash-0.5.9.1.orig/src/mystring.c dash-0.5.9.1/src/mystring.c
--- dash-0.5.9.1.orig/src/mystring.c	2017-03-18 05:23:28.0 +0100
+++ dash-0.5.9.1/src/mystring.c	2017-03-18 05:20:51.0 +0100
@@ -182,22 +182,6 @@
 	return 1;
 }
 
-/*
- * Check if a string is identical to a shell keyword (reserved word).
- */
-
-int
-is_kwd(const char *s)
-{
-	int i = 0;
-
-	do {
-		if (strcmp(s, parsekwd[i++]) == 0)
-			return 1;
-	} while (*parsekwd[i-1] != '}');  /* assuming that "}" is last entry */
-	
-	return 0;
-}
 	
 
 /*
@@ -213,7 +197,7 @@
 single_quote(const char *s, int conditional) {
 	char *p;
 
-	if (conditional && *s != '\0' && s[strspn(s, SHELLSAFECHARS)] == '\0' && ! is_kwd(s))
+	if (conditional && *s != '\0' && s[strspn(s, SHELLSAFECHARS)] == '\0' && ! findkwd(s))
 		return (char *)s;
 
 	STARTSTACKSTR(p);
diff -ur dash-0.5.9.1.orig/src/alias.c dash-0.5.9.1/src/alias.c
--- dash-0.5.9.1.orig/src/alias.c	2014-09-28 10:19:32.0 +0200
+++ dash-0.5.9.1/src/alias.c	2017-03-18 05:19:06.0 +0100
@@ -197,7 +197,7 @@
 
 void
 printalias(const struct alias *ap) {
-	out1fmt("%s=%s\n", ap->name, single_quote(ap->val));
+	out1fmt("%s=%s\n", ap->name, single_quote(ap->val, 0));
 }
 
 STATIC struct alias **
diff -ur dash-0.5.9.1.orig/src/eval.c dash-0.5.9.1/src/eval.c
--- dash-0.5.9.1.orig/src/eval.c	2016-09-02 16:12:23.0 +0200
+++ dash-0.5.9.1/src/eval.c	2017-03-18 05:19:06.0 +0100
@@ -95,7 +95,8 @@
 STATIC int evalbltin(const struct builtincmd *, int, char **, int);
 STATIC int evalfun(struct funcnode *, int, char **, int);
 STATIC void prehash(union node *);
-STATIC int eprintlist(struct output *, struct strlist *, int);
+STATIC int eprintvarlist(struct output *, struct strlist *, int);
+STATIC void eprintarglist(struct output *, struct strlist *, int);
 STATIC int bltincmd(int, char **);
 
 
@@ -786,8 +787,8 @@
 		out = 
 		outstr(expandstr(ps4val()), out);
 		sep = 0;
-		sep = eprintlist(out, varlist.list, sep);
-		eprintlist(out, arglist.list, sep);
+		sep = eprintvarlist(out, varlist.list, sep);
+		eprintarglist(out, arglist.list, sep);
 		outcslow('\n', out);
 #ifdef FLUSHERR
 		flushout(out);
@@ -1107,16 +1108,35 @@
 
 
 STATIC int
-eprintlist(struct output *out, struct strlist *sp, int sep)
+eprintvarlist(struct output *out, struct strlist *sp, int sep)
 {
 	while (sp) {
 		const char *p;
+		int i;
 
-		p = " %s" + (1 - sep);
+		if (sep)
+			outfmt(out, " ");
 		sep |= 1;
-		outfmt(out, p, sp->text);
+		i = 0;
+		while (sp->text[i] != '=' && sp->text[i] != '\0')
+			outfmt(out, "%c", sp->text[i++]);
+		if (sp->text[i] == '=')
+			outfmt(out, "=%s", single_quote(sp->text+i+1, 1));
 		sp = sp->next;
 	}
 
 	return sep;
 }
+
+STATIC void
+eprintarglist(struct output *out, struct strlist *sp, int sep)
+{
+	while (sp) {
+		const char *p;
+
+		p = " %s" + (1 - sep);
+		sep |= 1;
+		outfmt(out, p, single_quote(sp->text, 1));
+		sp = sp->next;
+	}
+}
diff -ur dash-0.5.9.1.orig/src/mystring.c dash-0.5.9.1/src/mystring.c
--- dash-0.5.9.1.orig/src/mystring.c	2014-09-28 10:19:32.0 +0200
+++ dash-0.5.9.1/src/mystring.c	2017-03-18 05:20:51.0 +0100
@@ -55,6 +55,7 @@
 #include "memalloc.h"
 #include "parser.h"
 #include "system.h"
+#include "token_vars.h"
 
 
 char nullstr[1];		/* zero length string */
@@ -181,16 +182,24 @@
 	return 1;
 }
 
+	
 
 /*
  * Produce a possibly single quoted string suitable as input to the shell.
- * The return string is allocated on the stack.
+ * If 'conditional' is nonzero, quoting is only done if the string contains
+ * non-shellsafe characters, or is identical to a shell keyword (reserved
+ * word); if it is zero, quoting is always done.
+ * If quoting was done, the return string is allocated on the stack,
+ * otherwise a pointer to the original string is returned.
  */
 
 char *
-si

Re: [PATCH] quote arguments in xtrace output

2017-02-28 Thread Martijn Dekker
Op 28-02-17 om 00:17 schreef Martijn Dekker:
> Op 27-02-17 om 21:08 schreef Martijn Dekker:
>> Shell reserved words (a.k.a. shell keywords) such as "if" are harder to
>> fix. The single_quote() routine, if told to quote conditionally, would
>> have to iterate through the internal list of shell reserved words and
>> quote the string if it matches one of those. I will see if I can make
>> that happen.
> 
> Here is a version that does that, and removes '=' and '!' from the list
> of shell-safe characters. This should fix all the issues you were
> reporting, hopefully making the xtrace output completely safe for shell
> re-entry. It introduces a new function is_kwd() that checks if a string
> is identical to a shell keyword/reserved word.
> 
> Attached are two patches: one incremental to my previous one, and one
> against pristine dash 0.5.9.1.

There's a bug in my code. Empties need to be quoted, or they'll
disappear. A simple check for the first character being null fixes it.

Once again, one incremental patch against v2, one against pristine dash
0.5.9.1.

- M.

diff -ur dash-0.5.9.1.orig/src/mystring.c dash-0.5.9.1/src/mystring.c
--- dash-0.5.9.1.orig/src/mystring.c	2017-02-28 04:37:45.0 +0100
+++ dash-0.5.9.1/src/mystring.c	2017-02-28 04:31:39.0 +0100
@@ -213,7 +213,7 @@
 single_quote(const char *s, int conditional) {
 	char *p;
 
-	if (conditional && s[strspn(s, SHELLSAFECHARS)] == '\0' && ! is_kwd(s))
+	if (conditional && *s != '\0' && s[strspn(s, SHELLSAFECHARS)] == '\0' && ! is_kwd(s))
 		return (char *)s;
 
 	STARTSTACKSTR(p);
diff -ur dash-0.5.9.1.orig/src/alias.c dash-0.5.9.1/src/alias.c
--- dash-0.5.9.1.orig/src/alias.c	2014-09-28 10:19:32.0 +0200
+++ dash-0.5.9.1/src/alias.c	2017-02-28 04:28:31.0 +0100
@@ -197,7 +197,7 @@
 
 void
 printalias(const struct alias *ap) {
-	out1fmt("%s=%s\n", ap->name, single_quote(ap->val));
+	out1fmt("%s=%s\n", ap->name, single_quote(ap->val, 0));
 }
 
 STATIC struct alias **
diff -ur dash-0.5.9.1.orig/src/eval.c dash-0.5.9.1/src/eval.c
--- dash-0.5.9.1.orig/src/eval.c	2016-09-02 16:12:23.0 +0200
+++ dash-0.5.9.1/src/eval.c	2017-02-28 04:28:31.0 +0100
@@ -95,7 +95,8 @@
 STATIC int evalbltin(const struct builtincmd *, int, char **, int);
 STATIC int evalfun(struct funcnode *, int, char **, int);
 STATIC void prehash(union node *);
-STATIC int eprintlist(struct output *, struct strlist *, int);
+STATIC int eprintvarlist(struct output *, struct strlist *, int);
+STATIC void eprintarglist(struct output *, struct strlist *, int);
 STATIC int bltincmd(int, char **);
 
 
@@ -786,8 +787,8 @@
 		out = 
 		outstr(expandstr(ps4val()), out);
 		sep = 0;
-		sep = eprintlist(out, varlist.list, sep);
-		eprintlist(out, arglist.list, sep);
+		sep = eprintvarlist(out, varlist.list, sep);
+		eprintarglist(out, arglist.list, sep);
 		outcslow('\n', out);
 #ifdef FLUSHERR
 		flushout(out);
@@ -1107,16 +1108,35 @@
 
 
 STATIC int
-eprintlist(struct output *out, struct strlist *sp, int sep)
+eprintvarlist(struct output *out, struct strlist *sp, int sep)
 {
 	while (sp) {
 		const char *p;
+		int i;
 
-		p = " %s" + (1 - sep);
+		if (sep)
+			outfmt(out, " ");
 		sep |= 1;
-		outfmt(out, p, sp->text);
+		i = 0;
+		while (sp->text[i] != '=' && sp->text[i] != '\0')
+			outfmt(out, "%c", sp->text[i++]);
+		if (sp->text[i] == '=')
+			outfmt(out, "=%s", single_quote(sp->text+i+1, 1));
 		sp = sp->next;
 	}
 
 	return sep;
 }
+
+STATIC void
+eprintarglist(struct output *out, struct strlist *sp, int sep)
+{
+	while (sp) {
+		const char *p;
+
+		p = " %s" + (1 - sep);
+		sep |= 1;
+		outfmt(out, p, single_quote(sp->text, 1));
+		sp = sp->next;
+	}
+}
diff -ur dash-0.5.9.1.orig/src/mystring.c dash-0.5.9.1/src/mystring.c
--- dash-0.5.9.1.orig/src/mystring.c	2014-09-28 10:19:32.0 +0200
+++ dash-0.5.9.1/src/mystring.c	2017-02-28 04:31:39.0 +0100
@@ -55,6 +55,7 @@
 #include "memalloc.h"
 #include "parser.h"
 #include "system.h"
+#include "token_vars.h"
 
 
 char nullstr[1];		/* zero length string */
@@ -181,16 +182,40 @@
 	return 1;
 }
 
+/*
+ * Check if a string is identical to a shell keyword (reserved word).
+ */
+
+int
+is_kwd(const char *s)
+{
+	int i = 0;
+
+	do {
+		if (strcmp(s, parsekwd[i++]) == 0)
+			return 1;
+	} while (*parsekwd[i-1] != '}');  /* assuming that "}" is last entry */
+	
+	return 0;
+}
+	
 
 /*
  * Produce a possibly single quoted string suitable as input to the shell.
- * The return string is allocated on the stack.
+ * If 'conditional' is nonzero, quoting is only done if the string contains
+ * non-shellsafe characters, or is identical to a shell keyword (reserved
+ * word); if it is zero, quoting is always done.
+ * If quoting was do

Re: dash tested against ash testsuite: 17 failures

2016-10-08 Thread Martijn Dekker
Op 01-10-16 om 19:17 schreef Denys Vlasenko:
> ash-glob/glob2.tests:
> Evidently, dash supports \f -> ^L escape.
> This test uses \f as invalid backslash escape,
> hence differences.

The test uses the "echo" builtin, which is very very unportable and
explicitly not standardised by POSIX for this case. A test failure based
on a difference in how "echo" interprets backslash escapes is not really
relevant. Use printf instead.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16
"It is not possible to use echo portably across all POSIX systems unless
both -n (as the first argument) and escape sequences are omitted."

> ash-misc/echo_write_error.tests
> EPIPE errors in echo are not reported

They're reported as an I/O error in echo. While that is not as specific,
it's still accurate. I don't really see a problem.

> ash-misc/func2.tests
> $((i++)) not supported

This is not required by POSIX. Use $((i+=1)) instead.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04
"The sizeof() operator and the prefix and postfix "++" and "--"
operators are not required."

> ash-misc/local1.tests
> Doesn't unset as described:
> local a
> # the above line unsets $a
> echo "A2:'$a'"

>From the dash man page:
"When a variable is made local, it inherits the initial value and
exported and readonly flags from the variable with the same name in the
surrounding scope, if there is one.  Otherwise, the variable is
initially unset."

Looks to me like dash behaves as advertised. Local variables aren't
standardised so implementations are free to do as they please.

(This is an interesting difference though, one I wasn't aware of until now.)

> ash-misc/shift1.tests
> "shift N" fails if fewer than N argv[i] exists
> (likely not a bug, but bash does it differently)

POSIX explicitly allows either behaviour:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_26_14
"If the n operand is invalid or is greater than "$#", this may be
considered a syntax error and a non-interactive shell may exit; if the
shell does not exit in this case, a non-zero exit status shall be
returned. Otherwise, zero shall be returned."

Cross-platform scripts must check that the operand is less than or equal
than $# before invoking 'shift'.

> ash-redir/redir.tests
> echo errors due to closed stdout are not reported

Again, they are in fact reported but as an I/O error. My response would
be the same as for ash-misc/echo_write_error.tests above.

> ash-redir/redir3.tests
> "echo foo >&9" correctly says "9: Bad file descriptor"
> and exitcode is 2 (in bash, it is 1).

Not a bug, IMO. POSIX simply says: "A failure to open or create a file
shall cause a redirection to fail" but does not specify with what exit
status it should fail. Any non-zero exit status < 128 should be fine.

> ash-redir/redir7.tests
> ash-redir/redir8.tests
> uni\x81code filename is not found by uni?code glob pattern.

dash does not support unicode; AFAIK, this is a design choice (I'm not a
developer).

> ash-signals/reap1.tests
> Builtins never wait for children. This loop will not ever stop:
> sleep 1 &
> PID=$!
> while kill -0 $PID >/dev/null 2>&1; do
> true
> done

Interesting bug.

The behaviour seems to depend on the presence of a 'while' loop.
Executed manually without a loop, it behaves as expected:

$ sleep 10 & PID=$!
$ kill -0 $PID
$ kill -0 $PID
[...etc...]
$ kill -0 $PID
[1] + Done   sleep 10
$ kill -0 $PID
dash: 11: kill: No such process

Executed with a loop it's infinite even in an interactive shell, but
only the first time around:

$ sleep 10 & PID=$!
$ while kill -0 $PID; do :; done
^C
[1] + Done   sleep 10
$ while kill -0 $PID; do :; done
dash: 15: kill: No such process

Strange behaviour and certainly looks like a rather obscure bug.

> ash-signals/savetrap.tests
> `trap` does not work as expected

Again, the SIG prefix is not required to be supported in POSIX (see above).

Unfortunately, according to POSIX, the feature that a subshell
containing only a single "trap" command with no operands inherits the
parent shell's traps so that they can be stored like save_traps=$(trap),
is optional! Hence this can't be considered a bug in dash, although I'd
certainly like to submit it as a feature request.

POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28
"When a subshell is entered, traps that are not being ignored shall be
set to the default actions, except in the case of a command substitution
containing only a single trap command, when the traps need not be
altered. Implementations may check for this case using only lexical
analysis [...]."

Only bash, yash, AT ksh93 and busybox ash currently support this
optional feature. This makes it extremely inconvenient to store the
output of 'trap' in a variable in a 

[BUG] 'trap' is not quite POSIX compliant

2016-02-13 Thread Martijn Dekker
The 'trap' command in dash is not compliant with POSIX. According to the
spec, both '-' and any unsigned decimal integer should be accepted as an
argument meaning 'unset this trap'; dash only accepts '-'.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28

I did testing on all other POSIX shells I know of with this little script:

trap 'echo BYYYE' EXIT
trap "$1" EXIT

This script will either produce 'BYYYE' or output a "command not found"
error message, depending on whether $1 was accepted as an argument for
unsetting the trap.

None of the shells are technically compliant: they only interpret an
unsigned decimal integer up to a certain value as 'unset this trap'; a
value exceeding that is interpreted as a command.

Interestingly, that maximum value differs slightly between shells. So
far, I've found:
- dash and Busybox ash do not accept any.
- bash, yash, pdksh, mksh/lksh, FreeBSD /bin/sh, and AT ksh "AJM 93u+
2012-08-01" accept up to 31.
- AT ksh "M 1993-12-28 s+" accepts up to 32.
- zsh 4.3.11, zsh 5.1.1 and zsh 5.2-dev-1 (current git) on my Mac accept
up to 33.
- zsh 5.0.2 on FreeBSD accepts up to 34.

For compatibility purposes it might seem wise to follow the majority of
implementations, accepting up to 31.

By the way, dash and zsh, as well as bash in non-POSIX mode, also accept
the counterintuitive and non-POSIX-compliant form "trap EXIT" to unset
the trap, but other shells don't: bash (POSIX), yash and ksh93 produce
an error, while mksh/lksh silently ignore that form. What a mess...

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash: read does not ignore trailing spaces

2016-01-29 Thread Martijn Dekker
Harald van Dijk schreef op 04-12-15 om 19:51:
> Here it is. Attached is an updated patch that ignores the complete
> terminator if only a single field remains, otherwise ignores only
> trailing IFS whitespace.

I've tested the patch and it looks like it fixes the bug nicely.

With the patch, dash (like bash, ksh93, mksh and FreeBSD /bin/sh)
completely passes the IFS test suite written by the AT ksh93 people:
http://web.archive.org/web/20050414022354/http://www.research.att.com/~gsf/public/ifs.sh
(archive.org link because the AT research site is down)

Any chance of getting this patch pushed into the git repo?

Thanks,

- M.

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Inheriting IFS from environment

2015-12-30 Thread Martijn Dekker
Unlike bash, *ksh and zsh, dash allows inheriting IFS from the environment:

$ IFS=bla dash -c "x='hela hola'; echo \$x"
he  ho

This seems a bit dodgy from a security point of view. For instance, most
scripts don't bother to quote their variables in test commands such as [
$var -eq 0 ], making it possible to influence the program flow by
manipulating IFS from the outside.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] test: -gt: unexpected operator

2015-07-13 Thread Martijn Dekker
Herbert Xu schreef op 13-07-15 om 08:06:
 Thanks for the report.  Does this patch help?

Yes, it does.

Thanks,

- M.

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] test: -gt: unexpected operator

2015-07-12 Thread Martijn Dekker
I found a bug in dash that affects checking the exit status of '[' or
'test' for failure.

After feeding an illegal number to 'test -t', 'test' will not accept any
operator (or at least not -gt or -lt) for the next invocation.

Confirmed in dash 0.5.7, 0.5.8 and current git version.

$ [ -t 12323454234578326584376438 ]
src/dash: 7: [: Illegal number: 12323454234578326584376438
$ [ $? -gt 1 ]  echo error
src/dash: 8: [: -gt: unexpected operator
$ [ $? -gt 1 ]  echo error
error
$ test -t 12323454234578326584376438
src/dash: 10: test: Illegal number: 12323454234578326584376438
$ test $? -gt 1  echo error
src/dash: 11: test: -gt: unexpected operator
$ test $? -gt 1  echo error
error
$ test -t 12323454234578326584376438
src/dash: 13: test: Illegal number: 12323454234578326584376438
$ test 2 -gt 1
src/dash: 14: test: -gt: unexpected operator
$ test 2 -gt 1
$ test -t 12323454234578326584376438
src/dash: 16: test: Illegal number: 12323454234578326584376438
$ test 2 -lt 1
src/dash: 17: test: -lt: unexpected operator
$ test 2 -lt 1
$

Thanks,

- Martijn
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash unset idiosyncrasies

2015-07-05 Thread Martijn Dekker
Parke schreef op 06-07-15 om 04:18:
 The man page does not say what happens if the given name corresponds
 only to a function.

Neither does POSIX, as you've found out: [...] if a variable by that
name does not exist, it is unspecified whether a function by that name,
if any, shall be unset. I agree the dash man page could do with
clarifying this.

I've found it best, to avoid bugs and confusion, to always use 'unset'
with either the -v or the -f option (but not both). This should work
consistently on all POSIXly shells.

FYI, other shells produce similar output for your test script (except
ATT ksh halts execution on 'unset' without parameters.) There seem to
be no shells that support unsetting both variables and functions with a
single 'unset' command.

- Martijn

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getopts doesn't properly update OPTIND when called from function

2015-06-04 Thread Martijn Dekker
Harald van Dijk schreef op 29-05-15 om 00:39:
 A quick patch to make sure it is global, and isn't reset when it
 shouldn't or doesn't need to be, is attached. You can experiment with
 it, if you like.

I've been using dash with this patch since you posted it, and it works
like a charm (including my function that extends getopts'
functionality). No issues encountered. Thanks.

Further discussion in this thread shows that the patch may conflict with
existing usage of 'getopts' for parsing the options within a function (a
usage that would make the script quite shell-specific, by the way,
because it would rely on Almquist-specific behaviour).

The issue, as I understand it, is that 'getopts' keeps not just the
OPTIND variable but also an additional invisible internal variable to
maintain its state. This is necessary to keep track of combined short
options.[*]

There appear to be two possible use cases for calling 'getopts' within a
function:

1. The option parsing loop is in the function, parsing the function's
options. This requires a function-local internal state of 'getopts',
otherwise calling a function using getopts from a main getopts loop
couldn't possibly work, because there is no way to directly save or
restore the unnamed internal state variable of getopts.

2. The option parsing loop is in the main shell environment, but instead
of calling getopts directly, the option parsing loop calls a function,
passing on the main positional parameters, and that function then calls
'getopts' and does additional things (in my case, re-parse GNU-style
--long options in terms of a special short option '--' with argument;
but of course it could be anything). This requires a global internal
'getopts' state.

Use case 1 requires a global internal 'getopts' state and use case 2
requires a local one, so they are mutually incompatible.

But I'm thinking that perhaps there is a way for the shell to
distinguish between these two use cases so that they can be reconciled.

The standard says that OPTIND is a global variable in any case, so use
case 1 above could only work if, before starting the function's option
parsing loop, OPTIND is either explicitly declared a function-local
variable using the non-standard 'local' keyword or is reinitialized
using an assignment.

On the other hand, use case 2 could only work if OPTIND is completely
left alone by the function, allowing a 'getopts' with a global state to
do its thing without interference.

So I would suggest the following might reconcile both use cases: By
default, make the 'getopts' internal state global. However, whenever
OPTIND is either assigned a value within a function or declared local
within a function, automatically make the 'getopts' internal state local
to the function.

Comments?

- M.

[*] Just as a datapoint, I found that yash has a different strategy for
this: it stores both values in OPTIND, separated by a semicolon -- e.g.
an $OPTIND of 3:2 means getopts is at the second option in the third
argument.

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] ${#var} returns length in bytes, not characters

2015-06-03 Thread Martijn Dekker
POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
 ${#parameter}
 String Length. The length in characters of the value of parameter
 shall be substituted. [...]

dash does not expand the length in characters; it expands the length in
bytes instead. That is invalid for locales that include multi-byte
characters, such as the now ubiquitous UTF-8 set.

Test case:

$ locale
LANG=nl_NL.UTF-8
LC_COLLATE=nl_NL.UTF-8
LC_CTYPE=nl_NL.UTF-8
LC_MESSAGES=nl_NL.UTF-8
LC_MONETARY=nl_NL.UTF-8
LC_NUMERIC=nl_NL.UTF-8
LC_TIME=nl_NL.UTF-8
LC_ALL=
$ word='bètatest'   # length: 8
$ echo ${#word}
9

Expected output: 8
Got output: 9

(bash, ksh93, mksh, and zsh all do this correctly.)

- Martijn
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html