Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-04-02 Thread Denys Vlasenko
On Mon, Apr 2, 2018 at 7:08 PM, Herbert Xu  wrote:
> On Mon, Apr 02, 2018 at 03:48:46PM +0200, Denys Vlasenko wrote:
>>
>> Another probably buggy case:
>>
>> cat <> -\t-\\-\"-\'-\`-\--\z-\*-\?-
>> `echo  '-\t-\\-\"-\x-\`-\--\z-\*-\?-'`
>> $(echo '-\t-\\-\"-\x-\`-\--\z-\*-\?-')
>> EOF
>>
>>
>> # bash heredoc_backslash.test
>> -\t-\-\"-\'-`-\--\z-\*-\?-
>> -\t-\-\"-\x-`-\--\z-\*-\?-
>> -\t-\\-\"-\x-\`-\--\z-\*-\?-
>>
>> # dash heredoc_backslash.test
>> -\t-\-\"-\'-`-\--\z-\*-\?-
>> --\-"-\x-`-\--\z-\*-\?-
>> --\-\"-\x-\`-\--\z-\*-\?-
>>
>> (Second and third line have \t expanded to tab. Mail-mangled to spaces, 
>> likely)
>>
>> Apart from \t, note that in second heredoc line there is a difference in how
>> `echo  '\"'` was handled: bash prints \", dash prints ".
>
> FWIW ksh93 does the same thing.


Building and launching current git...

sh-4.3$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
sh-4.3$ git pull
Already up to date.
sh-4.3$ make
make  all-recursive
...
make[2]: Leaving directory '/home/srcdevel/bbox/fix/dash-T'
make[1]: Leaving directory '/home/srcdevel/bbox/fix/dash-T'
sh-3.4$ src/dash

I see this:

$ x='\t'; echo "[$x]"
[]

I'm pretty sure this isn't okay...
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-04-02 Thread Herbert Xu
On Mon, Apr 02, 2018 at 03:48:46PM +0200, Denys Vlasenko wrote:
>
> Another probably buggy case:
> 
> cat < -\t-\\-\"-\'-\`-\--\z-\*-\?-
> `echo  '-\t-\\-\"-\x-\`-\--\z-\*-\?-'`
> $(echo '-\t-\\-\"-\x-\`-\--\z-\*-\?-')
> EOF
> 
> 
> # bash heredoc_backslash.test
> -\t-\-\"-\'-`-\--\z-\*-\?-
> -\t-\-\"-\x-`-\--\z-\*-\?-
> -\t-\\-\"-\x-\`-\--\z-\*-\?-
> 
> # dash heredoc_backslash.test
> -\t-\-\"-\'-`-\--\z-\*-\?-
> --\-"-\x-`-\--\z-\*-\?-
> --\-\"-\x-\`-\--\z-\*-\?-
> 
> (Second and third line have \t expanded to tab. Mail-mangled to spaces, 
> likely)
> 
> Apart from \t, note that in second heredoc line there is a difference in how
> `echo  '\"'` was handled: bash prints \", dash prints ".

FWIW ksh93 does the same thing.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-04-02 Thread Herbert Xu
On Mon, Apr 02, 2018 at 02:13:03PM +0200, Denys Vlasenko wrote:
>
> I was trying some of the beautiful atrocities from one of earlier
> Harald's emails and this one fails in current dash git:
> 
> # x=""; echo "${x#"${x+''}"''}"
> dash: 12: Syntax error: Missing '}'

Thanks for the report.  This patch should fix it.

---8<---
Subject: parser: Fix parameter expansion inside inner double quotes

The parsing of parameter expansion inside inner double quotes
breaks because we never look for ENDVAR while innerdq is true.

echo "${x#"${x+''}"''}

This patch fixes it by pushing the syntax stack if innerdq is
true and we enter a new parameter expansion.

This patch also fixes a corner case where a bad substitution error
occurs within arithmetic expansion.

Reported-by: Denys Vlasenko 
Fixes: ab1cecb40478 (" parser: Add syntax stack for recursive...")
Signed-off-by: Herbert Xu 

diff --git a/src/parser.c b/src/parser.c
index 6a8a4a4..a856458 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -1324,10 +1324,11 @@ badsub:
pungetc();
}
 
-   if (newsyn == ARISYNTAX && subtype > VSNORMAL)
+   if (newsyn == ARISYNTAX)
newsyn = DQSYNTAX;
 
-   if (newsyn != synstack->syntax) {
+   if ((newsyn != synstack->syntax || synstack->innerdq) &&
+   subtype != VSNORMAL) {
synstack_push(,
  synstack->prev ?:
  alloca(sizeof(*synstack)),
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-04-02 Thread Denys Vlasenko
On Sat, Mar 10, 2018 at 3:04 AM, Harald van Dijk  wrote:
> On 3/8/18 1:40 AM, Harald van Dijk wrote:
>>
>> If the syntax stack is to be stored on the actual stack, then real
>> recursion could be used instead, as attached.
>
>
> Even though it won't be accepted in dash, I continued with this approach for
> my own use. I've now got it to about 1800 bytes smaller (at -Os -s).
>
> After the other changes I'd done, it became apparent to me that the syntax
> tables were unnecessary, and that they'd become fairly easy to get rid of.
> This was a big space saver that may be possible to apply to your version as
> well.

I ported all recent dash commits to bbox ash. Nice!

I indeed feel that recursion is better implemented via function stack
recursion rather than linked list of alloca()ed structs.

Poor gcc has hard time optimizing readtoken1() atrocity as it stands.

I was trying some of the beautiful atrocities from one of earlier
Harald's emails and this one fails in current dash git:

# x=""; echo "${x#"${x+''}"''}"
dash: 12: Syntax error: Missing '}'
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-09 Thread Harald van Dijk

On 3/8/18 1:40 AM, Harald van Dijk wrote:
If the syntax stack is to be stored on the actual stack, then real 
recursion could be used instead, as attached.


Even though it won't be accepted in dash, I continued with this approach 
for my own use. I've now got it to about 1800 bytes smaller (at -Os -s).


After the other changes I'd done, it became apparent to me that the 
syntax tables were unnecessary, and that they'd become fairly easy to 
get rid of. This was a big space saver that may be possible to apply to 
your version as well.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..acd5fdf 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -83,7 +83,7 @@
 #define RMESCAPE_HEAP	0x10	/* Malloc strings instead of stalloc */
 
 /* Add CTLESC when necessary. */
-#define QUOTES_ESC	(EXP_FULL | EXP_CASE | EXP_QPAT)
+#define QUOTES_ESC	(EXP_FULL | EXP_CASE)
 /* Do not skip NUL characters. */
 #define QUOTES_KEEPNUL	EXP_TILDE
 
@@ -115,8 +115,8 @@ STATIC char *exptilde(char *, char *, int);
 STATIC void expbackq(union node *, int);
 STATIC const char *subevalvar(char *, char *, int, int, int, int, int);
 STATIC char *evalvar(char *, int);
-STATIC size_t strtodest(const char *, const char *, int);
-STATIC void memtodest(const char *, size_t, const char *, int);
+STATIC size_t strtodest(const char *, int);
+STATIC void memtodest(const char *, size_t, int);
 STATIC ssize_t varvalue(char *, int, int, int *);
 STATIC void expandmeta(struct strlist *, int);
 #ifdef HAVE_GLOB
@@ -333,16 +333,6 @@ addquote:
 		case CTLESC:
 			startloc++;
 			length++;
-
-			/*
-			 * Quoted parameter expansion pattern: remove quote
-			 * unless inside inner quotes or we have a literal
-			 * backslash.
-			 */
-			if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-			EXP_QPAT && *p != '\\')
-break;
-
 			goto addquote;
 		case CTLVAR:
 			p = evalvar(p, flag | inquotes);
@@ -396,7 +386,7 @@ done:
 	if (!home || !*home)
 		goto lose;
 	*p = c;
-	strtodest(home, SQSYNTAX, quotes);
+	strtodest(home, quotes | EXP_QUOTED);
 	return (p);
 lose:
 	*p = c;
@@ -521,7 +511,6 @@ expbackq(union node *cmd, int flag)
 	char *p;
 	char *dest;
 	int startloc;
-	char const *syntax = flag & EXP_QUOTED ? DQSYNTAX : BASESYNTAX;
 	struct stackmark smark;
 
 	INTOFF;
@@ -535,7 +524,7 @@ expbackq(union node *cmd, int flag)
 	if (i == 0)
 		goto read;
 	for (;;) {
-		memtodest(p, i, syntax, flag & QUOTES_ESC);
+		memtodest(p, i, flag & (QUOTES_ESC | EXP_QUOTED));
 read:
 		if (in.fd < 0)
 			break;
@@ -651,8 +640,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla
 	char *(*scan)(char *, char *, char *, char *, int , int);
 
 	argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ?
-			   (flag & (EXP_QUOTED | EXP_QPAT) ?
-			EXP_QPAT : EXP_CASE) : 0));
+			   EXP_CASE : 0));
 	STPUTC('\0', expdest);
 	argbackq = saveargbackq;
 	startp = stackblock() + startloc;
@@ -844,7 +832,7 @@ end:
  */
 
 STATIC void
-memtodest(const char *p, size_t len, const char *syntax, int quotes) {
+memtodest(const char *p, size_t len, int quotes) {
 	char *q;
 
 	if (unlikely(!len))
@@ -855,11 +843,17 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) {
 	do {
 		int c = (signed char)*p++;
 		if (c) {
-			if ((quotes & QUOTES_ESC) &&
-			((syntax[c] == CCTL) ||
-			 (((quotes & EXP_FULL) || syntax != BASESYNTAX) &&
-			  syntax[c] == CBACK)))
-USTPUTC(CTLESC, q);
+			if (quotes & QUOTES_ESC) {
+switch (c) {
+	case '\\':
+	case '!': case '*': case '?': case '[': case '=':
+	case '~': case ':': case '/': case '-': case ']':
+		if (quotes & EXP_QUOTED)
+	case CTLVARS:
+			USTPUTC(CTLESC, q);
+		break;
+}
+			}
 		} else if (!(quotes & QUOTES_KEEPNUL))
 			continue;
 		USTPUTC(c, q);
@@ -870,13 +864,10 @@ memtodest(const char *p, size_t len, const char *syntax, int quotes) {
 
 
 STATIC size_t
-strtodest(p, syntax, quotes)
-	const char *p;
-	const char *syntax;
-	int quotes;
+strtodest(const char *p, int quotes)
 {
 	size_t len = strlen(p);
-	memtodest(p, len, syntax, quotes);
+	memtodest(p, len, quotes);
 	return len;
 }
 
@@ -895,15 +886,13 @@ varvalue(char *name, int varflags, int flags, int *quotedp)
 	int sep;
 	char sepc;
 	char **ap;
-	char const *syntax;
 	int quoted = *quotedp;
 	int subtype = varflags & VSTYPE;
 	int discard = subtype == VSPLUS || subtype == VSLENGTH;
-	int quotes = (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL;
+	int quotes = quoted | (discard ? 0 : (flags & QUOTES_ESC)) | QUOTES_KEEPNUL;
 	ssize_t len = 0;
 
 	sep = (flags & EXP_FULL) << CHAR_BIT;
-	syntax = quoted ? DQSYNTAX : BASESYNTAX;
 
 	switch (*name) {
 	case '$':
@@ -946,11 +935,11 @@ param:
 		if (!(ap = shellparam.p))
 			return -1;
 		while ((p = *ap++)) {
-			len += strtodest(p, syntax, quotes);
+			len += strtodest(p, quotes);
 
 			if (*ap && sep) {
 len++;
-memtodest(, 1, syntax, quotes);
+

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-08 Thread Harald van Dijk

On 08/03/2018 08:55, Herbert Xu wrote:

On Thu, Mar 08, 2018 at 01:40:32AM +0100, Harald van Dijk wrote:


If the syntax stack is to be stored on the actual stack, then real recursion
could be used instead, as attached. I tried to avoid recursion for the cases
that unpatched dash already handled properly.


It does look a lot simpler than I expected.  However, this patch
is still 30% bigger than my patch :)


That's a bit unfair: that's mostly due to moved code, not to changed code.

But size-wise, your patch still wins: parser.o does become larger with 
my patch than with yours, largely due to my attempts to only recurse 
when needed.



As real recursion doesn't seem to buy us much I think I'll stick
with the syntax stack for now.


@@ -941,20 +1042,25 @@ readtoken1(int firstc, char const *syntax, char 
*eofmark, int striptabs)
c != '\\' && c != '`' &&
c != '$' && (
c != '"' ||
-   eofmark != NULL
+   (eofmark != NULL && 
!varnest)
+   ) && (
+   c != '}' ||
+   !varnest ||
+   (dqvarnest ? innerdq : 
dblquote)


So this seems to be the only substantial difference between your
patch and mine.  I have looked at the behaviour of other shells
and I think I will stick with my patch's behaviour for now.


The first line of this part of my patch is about something else:

  x=\"; cat <
In general, if there are disagreements between shells and the
standard is not specific enough, I'll pick the approach that
results in simpler code.


Fair enough.

Cheers,
Harald van Dijk


Thanks,


--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Herbert Xu
On Thu, Mar 08, 2018 at 01:40:32AM +0100, Harald van Dijk wrote:
>
> If the syntax stack is to be stored on the actual stack, then real recursion
> could be used instead, as attached. I tried to avoid recursion for the cases
> that unpatched dash already handled properly.

It does look a lot simpler than I expected.  However, this patch
is still 30% bigger than my patch :)

As real recursion doesn't seem to buy us much I think I'll stick
with the syntax stack for now.

> @@ -941,20 +1042,25 @@ readtoken1(int firstc, char const *syntax, char 
> *eofmark, int striptabs)
>   c != '\\' && c != '`' &&
>   c != '$' && (
>   c != '"' ||
> - eofmark != NULL
> + (eofmark != NULL && 
> !varnest)
> + ) && (
> + c != '}' ||
> + !varnest ||
> + (dqvarnest ? innerdq : 
> dblquote)

So this seems to be the only substantial difference between your
patch and mine.  I have looked at the behaviour of other shells
and I think I will stick with my patch's behaviour for now.

In general, if there are disagreements between shells and the
standard is not specific enough, I'll pick the approach that
results in simpler code.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Harald van Dijk

On 3/7/18 8:04 PM, Harald van Dijk wrote:

On 3/7/18 5:29 PM, Herbert Xu wrote:

On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote:


Another pre-existing dash parsing bug that I encountered now is $(( ( 
$(( 1
)) ) )). This should expand to 1, but gives a hard error in dash, 
again due
to the non-recursive nature of dash's parser. A small generalisation 
of what
I've been doing so far could handle this, so it makes sense to me to 
try to

achieve this at the same time.


Thanks for the patches.  You have convinced that a stacked syntax
is indeed necessary.  However, I'd like to do the reversion of the
run-time quote book-keeping patch in a separate step.

I also didn't quite like the idea of scanning the string backwards
to find the previous syntax.  So here is my attempt at the recursive
parsing using alloca.


If the syntax stack is to be stored on the actual stack, then real 
recursion could be used instead, as attached. I tried to avoid recursion 
for the cases that unpatched dash already handled properly.



It's not completely recursive in that I've
maintained the existing behaviour of double-quotes inside parameter
expansion inside double-quotes.  However, it does seem to address
most of the issues that you have raised.


One thing I found it doesn't handle, although it does look like you try 
to support it, is ${$-"}"}, which should expand to the same thing as $$. 
This doesn't work because the first " is CDQUOTE, not CENDQUOTE, so 
synstack->innerdq doesn't get set, and there's nothing else that 
prevents the quoted } from being treated as the end of the variable 
substitution.


To handle that it can just look at dblquote instead, included in the 
attached.


In this patch, I managed let "${$+\}}" expand to }, but "${$+"\}"}" to 
\}, for the reasons I gave earlier.


Martijn Dekker's test case of 'x=0; x=$((${x}+1))' also works with this.

Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..903e250 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -83,7 +83,7 @@
 #define RMESCAPE_HEAP	0x10	/* Malloc strings instead of stalloc */
 
 /* Add CTLESC when necessary. */
-#define QUOTES_ESC	(EXP_FULL | EXP_CASE | EXP_QPAT)
+#define QUOTES_ESC	(EXP_FULL | EXP_CASE)
 /* Do not skip NUL characters. */
 #define QUOTES_KEEPNUL	EXP_TILDE
 
@@ -333,16 +333,6 @@ addquote:
 		case CTLESC:
 			startloc++;
 			length++;
-
-			/*
-			 * Quoted parameter expansion pattern: remove quote
-			 * unless inside inner quotes or we have a literal
-			 * backslash.
-			 */
-			if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-			EXP_QPAT && *p != '\\')
-break;
-
 			goto addquote;
 		case CTLVAR:
 			p = evalvar(p, flag | inquotes);
@@ -651,8 +641,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla
 	char *(*scan)(char *, char *, char *, char *, int , int);
 
 	argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ?
-			   (flag & (EXP_QUOTED | EXP_QPAT) ?
-			EXP_QPAT : EXP_CASE) : 0));
+			   EXP_CASE : 0));
 	STPUTC('\0', expdest);
 	argbackq = saveargbackq;
 	startp = stackblock() + startloc;
@@ -1644,7 +1633,6 @@ char *
 _rmescapes(char *str, int flag)
 {
 	char *p, *q, *r;
-	unsigned inquotes;
 	int notescaped;
 	int globbing;
 
@@ -1674,24 +1662,23 @@ _rmescapes(char *str, int flag)
 			q = mempcpy(q, str, len);
 		}
 	}
-	inquotes = 0;
 	globbing = flag & RMESCAPE_GLOB;
 	notescaped = globbing;
 	while (*p) {
 		if (*p == (char)CTLQUOTEMARK) {
-			inquotes = ~inquotes;
 			p++;
 			notescaped = globbing;
 			continue;
 		}
+		if (*p == '\\') {
+			/* naked back slash */
+			notescaped = 0;
+			goto copy;
+		}
 		if (*p == (char)CTLESC) {
 			p++;
 			if (notescaped)
 *q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/expand.h b/src/expand.h
index 26dc5b4..90f5328 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -55,7 +55,6 @@ struct arglist {
 #define	EXP_VARTILDE	0x4	/* expand tildes in an assignment */
 #define	EXP_REDIR	0x8	/* file glob for a redirection (1 match only) */
 #define EXP_CASE	0x10	/* keeps quotes around for CASE pattern */
-#define EXP_QPAT	0x20	/* pattern in quoted parameter expansion */
 #define EXP_VARTILDE2	0x40	/* expand tildes after colons only */
 #define EXP_WORD	0x80	/* expand word in parameter expansion */
 #define EXP_QUOTED	0x100	/* expand word in double quotes */
diff --git a/src/parser.c b/src/parser.c
index 382658e..17d60b0 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -827,7 +827,8 @@ xxreadtoken(void)
 		}
 	}
 breakloop:
-	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
+	readtoken1(c, BASESYNTAX, (char *)NULL, 0);
+	return lasttoken;
 #undef RETURN
 }
 
@@ -855,47 +856,147 @@ static int pgetc_eatbnl(void)
  * word which marks the end of the document and striptabs is true if
  * leading tabs should be stripped from the document.  The 

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Martijn Dekker
Op 07-03-18 om 16:29 schreef Herbert Xu:
> I also didn't quite like the idea of scanning the string backwards
> to find the previous syntax.  So here is my attempt at the recursive
> parsing using alloca. 

This version introduces a parsing bug:

$ src/dash -c 'x=0; x=$((${x}+1))'
src/dash: 1: Syntax error: Unterminated quoted string

It is triggered by the ${x} (with braces) within an arithmetic expression.

- M.

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Harald van Dijk

On 3/7/18 5:29 PM, Herbert Xu wrote:

On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote:


Another pre-existing dash parsing bug that I encountered now is $(( ( $(( 1
)) ) )). This should expand to 1, but gives a hard error in dash, again due
to the non-recursive nature of dash's parser. A small generalisation of what
I've been doing so far could handle this, so it makes sense to me to try to
achieve this at the same time.


Thanks for the patches.  You have convinced that a stacked syntax
is indeed necessary.  However, I'd like to do the reversion of the
run-time quote book-keeping patch in a separate step.

I also didn't quite like the idea of scanning the string backwards
to find the previous syntax.  So here is my attempt at the recursive
parsing using alloca.


Nice approach.


It's not completely recursive in that I've
maintained the existing behaviour of double-quotes inside parameter
expansion inside double-quotes.  However, it does seem to address
most of the issues that you have raised.


One thing I found it doesn't handle, although it does look like you try 
to support it, is ${$-"}"}, which should expand to the same thing as $$. 
This doesn't work because the first " is CDQUOTE, not CENDQUOTE, so 
synstack->innerdq doesn't get set, and there's nothing else that 
prevents the quoted } from being treated as the end of the variable 
substitution.



As I have not reverted the run-time quote book-keeping, the handling
of $@/$* remains unchanged.


I'll try to re-do those fixes based on the approach you're going for here.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-07 Thread Herbert Xu
On Sun, Mar 04, 2018 at 10:29:25PM +0100, Harald van Dijk wrote:
>
> Another pre-existing dash parsing bug that I encountered now is $(( ( $(( 1
> )) ) )). This should expand to 1, but gives a hard error in dash, again due
> to the non-recursive nature of dash's parser. A small generalisation of what
> I've been doing so far could handle this, so it makes sense to me to try to
> achieve this at the same time.

Thanks for the patches.  You have convinced that a stacked syntax
is indeed necessary.  However, I'd like to do the reversion of the
run-time quote book-keeping patch in a separate step.

I also didn't quite like the idea of scanning the string backwards
to find the previous syntax.  So here is my attempt at the recursive
parsing using alloca.  It's not completely recursive in that I've
maintained the existing behaviour of double-quotes inside parameter
expansion inside double-quotes.  However, it does seem to address
most of the issues that you have raised.

As I have not reverted the run-time quote book-keeping, the handling
of $@/$* remains unchanged.

I will explore reverting it and will decide whether to do so
depending on how the code size/performance changes.

---8<---
Subject: parser: Add syntax stack for recursive parsing

Without a stack of syntaxes we cannot correctly these two cases
together:

"${a#'$$'}"
"${a#"${b-'$$'}"}"

A recursive parser also helps in some other corner cases such
as nested arithmetic expansion with paratheses.

This patch adds a syntax stack allocated from the stack using
alloca.  As a side-effect this allows us to remove the naked
backslashes for patterns within double-quotes, which means that
EXP_QPAT also has to go.

This patch also fixes removes any backslashes that precede right
braces when they are present within a parameter expansion context.

The idea of a recursive parser is based on a patch by Harald van
Dijk.

Signed-off-by: Herbert Xu 

diff --git a/src/expand.c b/src/expand.c
index 2a50830..903e250 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -83,7 +83,7 @@
 #define RMESCAPE_HEAP  0x10/* Malloc strings instead of stalloc */
 
 /* Add CTLESC when necessary. */
-#define QUOTES_ESC (EXP_FULL | EXP_CASE | EXP_QPAT)
+#define QUOTES_ESC (EXP_FULL | EXP_CASE)
 /* Do not skip NUL characters. */
 #define QUOTES_KEEPNUL EXP_TILDE
 
@@ -333,16 +333,6 @@ addquote:
case CTLESC:
startloc++;
length++;
-
-   /*
-* Quoted parameter expansion pattern: remove quote
-* unless inside inner quotes or we have a literal
-* backslash.
-*/
-   if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-   EXP_QPAT && *p != '\\')
-   break;
-
goto addquote;
case CTLVAR:
p = evalvar(p, flag | inquotes);
@@ -651,8 +641,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int 
startloc, int varfla
char *(*scan)(char *, char *, char *, char *, int , int);
 
argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ?
-  (flag & (EXP_QUOTED | EXP_QPAT) ?
-   EXP_QPAT : EXP_CASE) : 0));
+  EXP_CASE : 0));
STPUTC('\0', expdest);
argbackq = saveargbackq;
startp = stackblock() + startloc;
@@ -1644,7 +1633,6 @@ char *
 _rmescapes(char *str, int flag)
 {
char *p, *q, *r;
-   unsigned inquotes;
int notescaped;
int globbing;
 
@@ -1674,24 +1662,23 @@ _rmescapes(char *str, int flag)
q = mempcpy(q, str, len);
}
}
-   inquotes = 0;
globbing = flag & RMESCAPE_GLOB;
notescaped = globbing;
while (*p) {
if (*p == (char)CTLQUOTEMARK) {
-   inquotes = ~inquotes;
p++;
notescaped = globbing;
continue;
}
+   if (*p == '\\') {
+   /* naked back slash */
+   notescaped = 0;
+   goto copy;
+   }
if (*p == (char)CTLESC) {
p++;
if (notescaped)
*q++ = '\\';
-   } else if (*p == '\\' && !inquotes) {
-   /* naked back slash */
-   notescaped = 0;
-   goto copy;
}
notescaped = globbing;
 copy:
diff --git a/src/expand.h b/src/expand.h
index 26dc5b4..90f5328 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -55,7 +55,6 @@ struct arglist {
 #defineEXP_VARTILDE0x4 /* expand tildes in an assignment 

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Harald van Dijk

On 3/4/18 9:08 PM, Martijn Dekker wrote:

Op 04-03-18 om 16:46 schreef Harald van Dijk:

FreeBSD sh also prints a blank line here.

[...]

Like above, FreeBSD sh behaves like ksh.


I stand corrected.

Is there any port of FreeBSD sh to other operating systems? It would be
much more convenient for me to include it in my tests if I didn't have
to launch a FreeBSD VM and rsync & run the test scripts separately.


None that I know of. Running the test script over ssh might be slightly 
less difficult, but nothing as easy as a port. The source code contains 
several very much FreeBSD-specific bits.



Yes, the inconsistency should be fixed. Either it should be treated as
quoted or as unquoted, but not quoted-unless-it-comes-from-a-variable. I
have no strong feelings on which it should be.


Neither do I, so I would default to the behaviour that both pre-exists
in dash and corresponds with the majority of other shells.


I went for the behaviour that required the fewest changes for now, which 
is to treat them as unquoted. If it is agreed that it should be quoted, 
it requires some additional (minor) complications in the parser, because 
the existing state would no longer be sufficient to determine whether } 
should end the substitution. But yes, I agree that given how long dash 
has treated this as quoted, it makes sense to keep that, unless there's 
a compelling reason not to.



[...]

$ src/dash -c 'printf "%s\n" "${$+\}}"'
\}

Expected output: }  (no backslash), as in bash 4, yash, ksh93, pdksh,
mksh, zsh. In other words: it should be possible to escape a '}' with a
backslash within the parameter expansion, even if the expansion is
quoted.

POSIX ref.: 2.6.2 Parameter Expansion
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02

| Any '}' escaped by a  or within a quoted string, and
| characters in embedded arithmetic expansions, command substitutions,
| and variable expansions, shall not be examined in determining the
| matching '}'.


I believe this actually requires dash's behaviour. This says the first }
isn't examined in determining the matching '}', but only that: it just
says the parameter expansion expression is $+\}. It doesn't say the
backslash is removed.


I believe the word "escaped" implies that removal. If a '}' is escaped
by a backslash, it's implied that the backslash is removed as this
escaping is parsed, just as it's implied that quotes are removed from a
quoted string.


That's not implied, that's stated:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_07


The quote characters ( , single-quote, and double-quote) that were 
present in the original word shall be removed unless they have themselves been quoted.


In this case, the backslash was quoted, so this doesn't apply.


I agree that it would be much better to print } here though.


All other current shells except bosh (schilytools sh) agree, too -- even
FreeBSD sh, and I checked it this time.


Shells agree on the simple cases to remove the backslash:

  ${x+\}}
  "${x+\}}"
  

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Martijn Dekker
Op 04-03-18 om 16:46 schreef Harald van Dijk:
> FreeBSD sh also prints a blank line here.
[...]
> Like above, FreeBSD sh behaves like ksh.

I stand corrected.

Is there any port of FreeBSD sh to other operating systems? It would be
much more convenient for me to include it in my tests if I didn't have
to launch a FreeBSD VM and rsync & run the test scripts separately.


> Yes, the inconsistency should be fixed. Either it should be treated as
> quoted or as unquoted, but not quoted-unless-it-comes-from-a-variable. I
> have no strong feelings on which it should be.

Neither do I, so I would default to the behaviour that both pre-exists
in dash and corresponds with the majority of other shells.


>>> Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how
>>> "$@" got handled and reverting that changed it, I looked into how
>>> this works and fixed another bug. It also changes the handling of $*
>>> and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS
>>> properly at all, even if all parameters were non-empty. dash 0.5.9.1
>>> preserves empty parameters. With this patch, they get removed just
>>> like in bash. POSIX allows for either.
>> I don't think it does. POSIX implies that empty $@ and $* generates zero
>> fields. 2.5.2 Special Parameters:
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02
>>
>> | @
>> |    Expands to the positional parameters, starting from one, initially
>> |    producing one field for each positional parameter that is set.
>>
>> Since there are no positional parameters, no fields should initially be
>> produced at all. Same for $*.
>>
>> So I do believe your patch correctly fixes a bug here.
> 
> By empty parameters, I meant parameters that had been set to an empty
> string. It's covered by the 'set -- a ""' in my tests.

Ah yes, sorry for misreading. You're right that POSIX allows for either
removing or not removing empty fields generated by unquoted $@ and $*.
Note that the next iteration of POSIX will likely mandate their removal,
though. See the descripton of Austin Group bug 888:
http://austingroupbugs.net/view.php?id=888


>  You're right that
> after 'set --', unquoted $@ should not produce any fields. I hadn't even
> noticed that dash got that wrong and that my patch had fixed it.

:)


[...]
>> $ src/dash -c 'printf "%s\n" "${$+\}}"'
>> \}
>>
>> Expected output: }  (no backslash), as in bash 4, yash, ksh93, pdksh,
>> mksh, zsh. In other words: it should be possible to escape a '}' with a
>> backslash within the parameter expansion, even if the expansion is
>> quoted.
>>
>> POSIX ref.: 2.6.2 Parameter Expansion
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
>>
>> | Any '}' escaped by a  or within a quoted string, and
>> | characters in embedded arithmetic expansions, command substitutions,
>> | and variable expansions, shall not be examined in determining the
>> | matching '}'.
> 
> I believe this actually requires dash's behaviour. This says the first }
> isn't examined in determining the matching '}', but only that: it just
> says the parameter expansion expression is $+\}. It doesn't say the
> backslash is removed.

I believe the word "escaped" implies that removal. If a '}' is escaped
by a backslash, it's implied that the backslash is removed as this
escaping is parsed, just as it's implied that quotes are removed from a
quoted string.

> I agree that it would be much better to print } here though. 

All other current shells except bosh (schilytools sh) agree, too -- even
FreeBSD sh, and I checked it this time.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Harald van Dijk

On 3/4/18 7:05 PM, Denys Vlasenko wrote:

On Fri, Mar 2, 2018 at 7:03 PM, Harald van Dijk  wrote:

On 02/03/2018 18:00, Denys Vlasenko wrote:


On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk 
wrote:


Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



I was looking into this specific example and I believe it is a _bash_ bug.

The [a\]] is misinterpreted by it (and probably by many people).
The gist is: \] is not a valid escape for ] in set glob expression.
Glob sets have no escaping at all, ] can be in a set
if it is the first char: []abc],
dash can be in a set if it is first or last: [abc-],
[ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.

Therefore, "[a\]]" glob pattern means "a or \, then ]".
Since that does not match "a", the result of ${foo#[a\]]}> should be "a".


Are you sure about this? "Patterns Matching a Single Character"'s first
paragraph contains "A  character shall escape the following
character. The escaping  shall be discarded." The shell does this
first.


I have problems with "The shell does this first" statement.

It's useful to view the entire discussion of glob pattern matching
as a discussion of how fnmatch(pattern, string, flags) should behave
(even if a particular shell implementation chose to not use
C library's fnmatch() to implement its globbing).


Okay. dash has an option to use fnmatch() for pattern matching.

But POSIX says fnmatch() performs pattern matching the way the shell 
does, not the other way around. And there are a few differences that 
follow from it, because character quote status cannot be represented the 
same way in a C string as in the description of the shell.



Otherwise (IOW: if you allow gobbing to depend on shell's quoting),
rules for globbing for different applications will not be consistent.
Which would be bad.


That's already the case, and cannot be helped. The shell pattern 
"[a]"[a] is supposed to match a file named "[a]a", and does. But if that 
file exists, and I run find . -name '"[a]"[a]', I won't get any results. 
I could use find . -name '\[a\][a]' for that though. And dash, if built 
with fnmatch() support, translates it to that in order to pass it to 
fnmatch().


This matches what POSIX says: POSIX doesn't make the removal of " part 
of pattern matching, and find -name and fnmatch() *only* implement the 
pattern matching part of the shell.



As I see it, shell should massage input according to shell rules
(quote/bkslash removal et al), then use fnmatch() or glob(), or its own
internal implementations of them.

bash seems to not do it. It probably has a "combined" routine
which does both in one step, which allows quote removal
to interfere with globbing. Here's the proof:

$ x='a]'; echo _${x#[a\]]}_
_]_

In the above code, what pattern should be fed to fnmatch(),
assuming shell uses fnmatch() to implement ${x#pattern}?
Pattern should be "[a]]" because by shell rules "\]" in
an unquoted string is "]".


By the normal shell rules, there are two places backslashes get removed. 
The first is during pattern matching. Not before, during.


If fnmatch() is used to implement shell pathname expansion and other 
pattern matching, the string to pass here is literally [a\]]. Just like 
how for \*, the string to pass is literally \*, and it would be horribly 
wrong to pass * here as if it were unquoted. fnmatch() will remove the 
backslashes and take the following character literally.


The second place where the shell removes backslashes is during quote 
removal. But this takes place after pathname expansion.



But try this:

$ x='a]'; echo _${x#[a]]}_
__

Here, pattern should be "[a]]" as well - it literally is.


Here, the pattern should indeed be [a]].


But the results are different!

Evidently, bash does _not_ perform quote removal (more precisely,
backslash removal) on pattern string. Somehow, globbing code
knows \ was there.

(And this globbing code, in my opinion, also misinterprets [a\]]
as "set of 'a' or ']'", but (a) I might be wrong on this, and
(b) this is a bit offtopic, we discuss ${x#pattern} handling here).

To me, it looks that bash behavior is buggy regardless of what \]
means in glob patterns. These two should be equivalent:

x='a]'; echo _${x#[a\]]}_
x='a]'; echo _${x#[a]]}_

because they should use the same pattern for globbing match.


See above.


Alternative possibility is that pattern in ${x#pattern} is not handled
by the usual shell rules: backslashes are not removed.
This would be VERY ugly as soon as nested variable expansions are considered.


There are and there are supposed to be a few differences in how 
${x#pattern} is treated vs. pattern, described in Patterns Used for 
Filename Expansion. But backslash handling is not one of those differences.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line 

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Denys Vlasenko
On Fri, Mar 2, 2018 at 7:03 PM, Harald van Dijk  wrote:
> On 02/03/2018 18:00, Denys Vlasenko wrote:
>>
>> On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk 
>> wrote:
>>>
>>> Currently:
>>>
>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
>>> <>
>>>
>>> This is what I expect, and also what bash, ksh and posh do.
>>>
>>> With your patch:
>>>
>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
>>> 
>>
>> I was looking into this specific example and I believe it is a _bash_ bug.
>>
>> The [a\]] is misinterpreted by it (and probably by many people).
>> The gist is: \] is not a valid escape for ] in set glob expression.
>> Glob sets have no escaping at all, ] can be in a set
>> if it is the first char: []abc],
>> dash can be in a set if it is first or last: [abc-],
>> [ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.
>>
>> Therefore, "[a\]]" glob pattern means "a or \, then ]".
>> Since that does not match "a", the result of ${foo#[a\]]}> should be "a".
>
> Are you sure about this? "Patterns Matching a Single Character"'s first
> paragraph contains "A  character shall escape the following
> character. The escaping  shall be discarded." The shell does this
> first.

I have problems with "The shell does this first" statement.

It's useful to view the entire discussion of glob pattern matching
as a discussion of how fnmatch(pattern, string, flags) should behave
(even if a particular shell implementation chose to not use
C library's fnmatch() to implement its globbing).

Otherwise (IOW: if you allow gobbing to depend on shell's quoting),
rules for globbing for different applications will not be consistent.
Which would be bad.

As I see it, shell should massage input according to shell rules
(quote/bkslash removal et al), then use fnmatch() or glob(), or its own
internal implementations of them.

bash seems to not do it. It probably has a "combined" routine
which does both in one step, which allows quote removal
to interfere with globbing. Here's the proof:

$ x='a]'; echo _${x#[a\]]}_
_]_

In the above code, what pattern should be fed to fnmatch(),
assuming shell uses fnmatch() to implement ${x#pattern}?
Pattern should be "[a]]" because by shell rules "\]" in
an unquoted string is "]".

But try this:

$ x='a]'; echo _${x#[a]]}_
__

Here, pattern should be "[a]]" as well - it literally is.

But the results are different!

Evidently, bash does _not_ perform quote removal (more precisely,
backslash removal) on pattern string. Somehow, globbing code
knows \ was there.

(And this globbing code, in my opinion, also misinterprets [a\]]
as "set of 'a' or ']'", but (a) I might be wrong on this, and
(b) this is a bit offtopic, we discuss ${x#pattern} handling here).

To me, it looks that bash behavior is buggy regardless of what \]
means in glob patterns. These two should be equivalent:

x='a]'; echo _${x#[a\]]}_
x='a]'; echo _${x#[a]]}_

because they should use the same pattern for globbing match.

Alternative possibility is that pattern in ${x#pattern} is not handled
by the usual shell rules: backslashes are not removed.
This would be VERY ugly as soon as nested variable expansions are considered.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Harald van Dijk

On 3/4/18 4:26 PM, Martijn Dekker wrote:

Op 04-03-18 om 11:44 schreef Harald van Dijk:

When CTLENDVAR is seen, I double-check that syntax has the expected
value. This fixes the handling of "${$+"}"}", where the inner } was
seen as ending the variable substitution.


Also looks like dash with your patch considers the "}" to be quoted:

$ src/dash -c 'IFS=}; printf %s\\n "${$+"}"}"'
}

Only AT ksh93 prints an empty string there, as it doesn't consider the
double quotes to be nested, so the "}" is unquoted. Ash produces a
syntax error, like dash before this patch. All other shells print a }.
So dash with this patch behaves like the majority.


FreeBSD sh also prints a blank line here.


This changes how "${x+"$y"}" get handled: POSIX is silent about
whether the $y should be treated as quoted. dash has treated it as
quoted for a very long time. ash has historically treated it as
unquoted. With this patch, it gets treated as unquoted.


That seems inconsistent with how it handles "${$+"}"}", in which the "}"
is treated as quoted (see above).

ksh93 is the only existing shell that treats the $y as unquoted, so I
think it would be better if dash continued to treat the $y as quoted.

Even if not, the inconsistency should be fixed.


Like above, FreeBSD sh behaves like ksh.

Yes, the inconsistency should be fixed. Either it should be treated as 
quoted or as unquoted, but not quoted-unless-it-comes-from-a-variable. I 
have no strong feelings on which it should be.



Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how
"$@" got handled and reverting that changed it, I looked into how
this works and fixed another bug. It also changes the handling of $*
and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS
properly at all, even if all parameters were non-empty. dash 0.5.9.1
preserves empty parameters. With this patch, they get removed just
like in bash. POSIX allows for either.

I don't think it does. POSIX implies that empty $@ and $* generates zero
fields. 2.5.2 Special Parameters:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02
| @
|Expands to the positional parameters, starting from one, initially
|producing one field for each positional parameter that is set.

Since there are no positional parameters, no fields should initially be
produced at all. Same for $*.

So I do believe your patch correctly fixes a bug here.


By empty parameters, I meant parameters that had been set to an empty 
string. It's covered by the 'set -- a ""' in my tests. You're right that 
after 'set --', unquoted $@ should not produce any fields. I hadn't even 
noticed that dash got that wrong and that my patch had fixed it.



I would be a bit surprised if the patch is acceptable in its current
form, but it's worth seeing which of the current results are definitely
correct, which of the results are acceptable, which results may well be
unwanted, and which special cases I missed.


According to my tests (i.e. the modernish test suite), nearly everything
is POSIXly correct. There's only one parameter expansion problem left
that I can find, and it existed before this patch as well:

$ src/dash -c 'printf "%s\n" "${$+\}}"'
\}

Expected output: }  (no backslash), as in bash 4, yash, ksh93, pdksh,
mksh, zsh. In other words: it should be possible to escape a '}' with a
backslash within the parameter expansion, even if the expansion is quoted.

POSIX ref.: 2.6.2 Parameter Expansion
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
| Any '}' escaped by a  or within a quoted string, and
| characters in embedded arithmetic expansions, command substitutions,
| and variable expansions, shall not be examined in determining the
| matching '}'.


I believe this actually requires dash's behaviour. This says the first } 
isn't examined in determining the matching '}', but only that: it just 
says the parameter expansion expression is $+\}. It doesn't say the 
backslash is removed. Then, \} is taken as part of a double-quoted 
string, and inside double-quoted strings, a backslash that isn't 
followed by $, `, ", \ or a newline is taken as a literal backslash. I 
agree that it would be much better to print } here though.


Thanks for the testing! I'd noticed I had an off-by-one error that 
causes problems when an expansion ends in another expansion 
(${x+${x+}}). I'll try to improve it to also handle the issues you've 
pointed out.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Martijn Dekker
Op 04-03-18 om 11:44 schreef Harald van Dijk:
> When CTLENDVAR is seen, I double-check that syntax has the expected
> value. This fixes the handling of "${$+"}"}", where the inner } was
> seen as ending the variable substitution.

Also looks like dash with your patch considers the "}" to be quoted:

$ src/dash -c 'IFS=}; printf %s\\n "${$+"}"}"'
}

Only AT ksh93 prints an empty string there, as it doesn't consider the
double quotes to be nested, so the "}" is unquoted. Ash produces a
syntax error, like dash before this patch. All other shells print a }.
So dash with this patch behaves like the majority.


> This fixes more cases than just backslashes and single quotes:
> another character that's special in unquoted contexts is ~, so
> "${HOME#~}" should expand to an empty string.

Yes. That was a bug and this patch fixes it.


> This changes how $(( ${$+"123"} )) gets handled: POSIX doesn't really
> answer this, I think. POSIX says that $(( "123" )) is a syntax error,
> but doesn't address whether " is special when it appears in other
> places than directly in the $((...)). Most shells accept $((
> ${$+"123"} )), and with this patch, dash accepts it too.

I've no strong feelings about this. The change doesn't seem like it
could be harmful.


> This changes how "${x+"$y"}" get handled: POSIX is silent about
> whether the $y should be treated as quoted. dash has treated it as
> quoted for a very long time. ash has historically treated it as
> unquoted. With this patch, it gets treated as unquoted.

That seems inconsistent with how it handles "${$+"}"}", in which the "}"
is treated as quoted (see above).

ksh93 is the only existing shell that treats the $y as unquoted, so I
think it would be better if dash continued to treat the $y as quoted.

Even if not, the inconsistency should be fixed.


> Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how
> "$@" got handled and reverting that changed it, I looked into how
> this works and fixed another bug. It also changes the handling of $*
> and $@ when IFS is set but empty: dash 0.5.8 didn't handle empty IFS
> properly at all, even if all parameters were non-empty. dash 0.5.9.1
> preserves empty parameters. With this patch, they get removed just
> like in bash. POSIX allows for either.
I don't think it does. POSIX implies that empty $@ and $* generates zero
fields. 2.5.2 Special Parameters:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02
| @
|Expands to the positional parameters, starting from one, initially
|producing one field for each positional parameter that is set.

Since there are no positional parameters, no fields should initially be
produced at all. Same for $*.

So I do believe your patch correctly fixes a bug here.


> I would be a bit surprised if the patch is acceptable in its current
> form, but it's worth seeing which of the current results are definitely
> correct, which of the results are acceptable, which results may well be
> unwanted, and which special cases I missed.

According to my tests (i.e. the modernish test suite), nearly everything
is POSIXly correct. There's only one parameter expansion problem left
that I can find, and it existed before this patch as well:

$ src/dash -c 'printf "%s\n" "${$+\}}"'
\}

Expected output: }  (no backslash), as in bash 4, yash, ksh93, pdksh,
mksh, zsh. In other words: it should be possible to escape a '}' with a
backslash within the parameter expansion, even if the expansion is quoted.

POSIX ref.: 2.6.2 Parameter Expansion
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
| Any '}' escaped by a  or within a quoted string, and
| characters in embedded arithmetic expansions, command substitutions,
| and variable expansions, shall not be examined in determining the
| matching '}'.


Thanks,

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-04 Thread Harald van Dijk

On 3/2/18 11:58 AM, Harald van Dijk wrote:

On 02/03/2018 08:49, Herbert Xu wrote:

If we fix this in the parser then everything should just work.


Right, that's the approach FreeBSD sh has taken that I referred to in my 
message from Feb 18, that I'd personally prefer as well. It basically 
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting 
syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse 
of a variable expansion starts, and finding a sensible way to change the 
syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD 
sh, an explicit stack of syntaxes is created for this, but that might be 
avoidable: with slight modifications to what gets stored in the byte 
after CTLVAR/CTLARI, it might be possible to go back through the parser 
output to determine the syntax to revert to. I'll see if I can get that 
working.


Since I didn't see how to avoid this approach, I went ahead with this 
attempt and the attached is the result.


I started out by reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c. 
Since that also removed some dead code, I re-removed the read code. I 
then modified the byte after CTLVAR so that it didn't just store whether 
the result was quoted, it stored the prior syntax, and forcibly reset 
syntax to either BASESYNTAX or DQSYNTAX as appropriate. Then, in 
CTLENDVAR, I look for the opening CTLVAR, and use that to restore the 
prior syntax. The same goes for CTLARI/CTLENDARI too.


When CTLENDVAR is seen, I double-check that syntax has the expected 
value. This fixes the handling of "${$+"}"}", where the inner } was seen 
as ending the variable substitution.


This fixes more cases than just backslashes and single quotes: another 
character that's special in unquoted contexts is ~, so "${HOME#~}" 
should expand to an empty string.


This changes how $(( ${$+"123"} )) gets handled: POSIX doesn't really 
answer this, I think. POSIX says that $(( "123" )) is a syntax error, 
but doesn't address whether " is special when it appears in other places 
than directly in the $((...)). Most shells accept $(( ${$+"123"} )), and 
with this patch, dash accepts it too.


This changes how "${x+"$y"}" get handled: POSIX is silent about whether 
the $y should be treated as quoted. dash has treated it as quoted for a 
very long time. ash has historically treated it as unquoted. With this 
patch, it gets treated as unquoted.


Since 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c had also changed how "$@" 
got handled and reverting that changed it, I looked into how this works 
and fixed another bug. It also changes the handling of $* and $@ when 
IFS is set but empty: dash 0.5.8 didn't handle empty IFS properly at 
all, even if all parameters were non-empty. dash 0.5.9.1 preserves empty 
parameters. With this patch, they get removed just like in bash. POSIX 
allows for either.


I would be a bit surprised if the patch is acceptable in its current 
form, but it's worth seeing which of the current results are definitely 
correct, which of the results are acceptable, which results may well be 
unwanted, and which special cases I missed.


Cheers,
Harald van Dijk

command:  echo "\*"
bash: \*
dash 0.5.8:   \ \
dash 0.5.9.1: \ \
dash patched: \*

command:  case \\ab in "\*") echo BUG;; esac
bash:
dash 0.5.8:   BUG
dash 0.5.9.1: BUG
dash patched:

command:  case \\a in "\?") echo BUG;; esac
bash:
dash 0.5.8:   BUG
dash 0.5.9.1: BUG
dash patched:

command:  foo=\\; echo "<${foo#[\\]]}>"
bash: <\>
dash 0.5.8:   <\>
dash 0.5.9.1: <\>
dash patched: <\>

command:  foo=a; echo "<${foo#[a\]]}>"
bash: <>
dash 0.5.8:   <>
dash 0.5.9.1: <>
dash patched: <>

command:  x=yz; echo "${x#'y'}"
bash: z
dash 0.5.8:   yz
dash 0.5.9.1: yz
dash patched: z

command:  x=yz; echo "${x+'y'}"
bash: 'y'
dash 0.5.8:   'y'
dash 0.5.9.1: 'y'
dash patched: 'y'

command:  x=""; echo "${x#"${x+''}"''}"
bash: ''
dash 0.5.8:
dash 0.5.9.1:
dash patched: ''

command:  HOME=/; echo "${HOME#~}"
bash:
dash 0.5.8:   /
dash 0.5.9.1: /
dash patched:

command:  x="13"; echo $(( ${x#'1'} ))
bash: 3
dash 0.5.8:   13
dash 0.5.9.1: 13
dash patched: 3

command:  echo $(( ${$+"123"} ))
bash: 123
dash 0.5.8:   dash: 1: arithmetic expression: expecting primary: " "123" "
dash 0.5.9.1: dash: 1: arithmetic expression: expecting primary: " "123" "
dash patched: 123

command:  set -- a ""; space=" "; printf "<%s>" "$@"$space
bash: <>
dash 0.5.8:   < >
dash 0.5.9.1: < >
dash patched: <>

command:  IFS=; set -- a b; printf "<%s>" $@
bash: 
dash 0.5.8:   
dash 0.5.9.1: 
dash patched: 

command:  IFS=; set -- a ""; printf "<%s>" $@
bash: 
dash 0.5.8:   
dash 0.5.9.1: <>
dash patched: 

command:  IFS=; set -- a ""; printf "<%s>" $*
bash: 
dash 0.5.8:   
dash 0.5.9.1: <>
dash patched: 

command:  echo "${$+"{}"}"
bash: {}
dash 0.5.8:   dash: 

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 17:33, Herbert Xu wrote:

On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote:


But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}",
the ' is a quote character. This part is hard. If the above is done simply
using another local variable, then the parse of the nested ${y-} would
clobber that variable.


I don't see why that's hard.  You just need to remember whether
you're in a pattern context (i.e., after a %/%% or #/##).  If you
are, then you need to go back to basesyntax instead of dqsyntax
until the next right brace.


Let me slightly modify my example so that the effect becomes different:

  "${x#"${y-*}"''}"

When the parser has processed

  "${x#"${y-

would the state be "in a pattern context", or "not in a pattern context"?

If the state is "in a pattern context", how do you prevent the next * 
from being taken as unquoted? It should be taken as quoted.


If it stores "not in a pattern context", and the parser is processing 
the last character in


  "${x#"${y-*}

how can it reset the state to "in a pattern context"? Where could this 
state be stored?


With arbitrarily deep nesting of variable expansions, I do not see how 
you can avoid requiring arbitrarily large state, which gets difficult 
with dash's non-recursive parser. Mind you, I do hope I'm missing 
something obvious here!


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 18:00, Denys Vlasenko wrote:

On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk  wrote:

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'




I was looking into this specific example and I believe it is a _bash_ bug.

The [a\]] is misinterpreted by it (and probably by many people).
The gist is: \] is not a valid escape for ] in set glob expression.
Glob sets have no escaping at all, ] can be in a set
if it is the first char: []abc],
dash can be in a set if it is first or last: [abc-],
[ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.

Therefore, "[a\]]" glob pattern means "a or \, then ]".
Since that does not match "a", the result of ${foo#[a\]]}> should be "a".


Are you sure about this? "Patterns Matching a Single Character"'s first 
paragraph contains "A  character shall escape the following 
character. The escaping  shall be discarded." The shell does 
this first. It removes the backslash (remembering that the following 
character is escaped) before it starts interpreting the result as a 
bracket expression, and so never gets to the point where the \ should be 
taken literally.


  case \] in [\]]) echo matched ;; esac

prints "matched" in all shells I can check.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Denys Vlasenko
On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk  wrote:
> Currently:
>
> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
> <>
>
> This is what I expect, and also what bash, ksh and posh do.
>
> With your patch:
>
> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
> 


I was looking into this specific example and I believe it is a _bash_ bug.

The [a\]] is misinterpreted by it (and probably by many people).
The gist is: \] is not a valid escape for ] in set glob expression.
Glob sets have no escaping at all, ] can be in a set
if it is the first char: []abc],
dash can be in a set if it is first or last: [abc-],
[ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.

Therefore, "[a\]]" glob pattern means "a or \, then ]".
Since that does not match "a", the result of ${foo#[a\]]}> should be "a".
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 16:28, Herbert Xu wrote:

On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote:



If we fix this in the parser then everything should just work.


Right, that's the approach FreeBSD sh has taken that I referred to in my
message from Feb 18, that I'd personally prefer as well. It basically
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax
to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a
variable expansion starts, and finding a sensible way to change the syntax
back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an
explicit stack of syntaxes is created for this, but that might be avoidable:
with slight modifications to what gets stored in the byte after
CTLVAR/CTLARI, it might be possible to go back through the parser output to
determine the syntax to revert to. I'll see if I can get that working.


Yes but that's overkill just to fix single quote within patterns.
We already support nested double-quotes in patterns correctly.  As
single quotes cannot nest, it should be an easy fix.


Single quotes indeed cannot nest, but you do need to reliably determine 
when a single quote is a special character, which gets very tricky very 
quickly with nested substitutions.


In "${x+''}", the ' is the literal ' character. In "${x#''}", the ' is a 
quote character. This part is easy, this part is just a matter of 
setting another variable when the parse of the substitution starts.


But in "${x+${y-}''}", the ' is the literal ' character. In 
"${x#${y-}''}", the ' is a quote character. This part is hard. If the 
above is done simply using another local variable, then the parse of the 
nested ${y-} would clobber that variable.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 08:49, Herbert Xu wrote:

On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote:

On 01/03/2018 00:04, Harald van Dijk wrote:

$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the
pattern is considered unquoted regardless of the outer quotation marks.
Because of that, the single quote characters should not be taken
literally, but should be taken as quoting the y. ksh, posh and zsh agree
with bash.


Unfortunately, this causes another problem with all of the backslash
approaches so far:

   x=''; printf "%s\n" "${x#''}"

This should print a blank line. (bash, ksh, posh and zsh agree.)

Here, dash's parser stores '$\$\', where $ is a control character. preglob
would need to turn this into . The problem is again that preglob
cannot increase the string length. Perhaps the parser needs to store this as
'$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it
requires some more invasive changes.


These are different issues.  dash's parser currently does not
understand nested quoting in patterns at all.  That is, if your
parameter expansion are within double quotes, then dash at the
parser level will consider the pattern to be double-quoted.  Thus
any nested single-quotes will be literals instead of actual quotes.


That's the same thing though. The problem with the backslashes is also 
that dash sees them as double-quoted when they should be seen as 
unquoted, and the approach taken in commit 
7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c that lasts to this day was 
specifically to *not* fix this in the parser, but to simply have the 
parser record enough information so that quote status can be determined 
and patched up during expansion. It's just that in the case of single 
quotes, expansion was never modified to recognise them. Thinking some 
more, I don't think the parser actually records enough information to 
let that work.



If we fix this in the parser then everything should just work.


Right, that's the approach FreeBSD sh has taken that I referred to in my 
message from Feb 18, that I'd personally prefer as well. It basically 
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting 
syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse 
of a variable expansion starts, and finding a sensible way to change the 
syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD 
sh, an explicit stack of syntaxes is created for this, but that might be 
avoidable: with slight modifications to what gets stored in the byte 
after CTLVAR/CTLARI, it might be possible to go back through the parser 
output to determine the syntax to revert to. I'll see if I can get that 
working.



Cheers,

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-01 Thread Herbert Xu
On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote:
> On 01/03/2018 00:04, Harald van Dijk wrote:
> >$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
> >z
> >
> >$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
> >yz
> >
> >(That is, they are executing x=yz; echo "${x#'y'}".)
> >
> >POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the
> >pattern is considered unquoted regardless of the outer quotation marks.
> >Because of that, the single quote characters should not be taken
> >literally, but should be taken as quoting the y. ksh, posh and zsh agree
> >with bash.
> 
> Unfortunately, this causes another problem with all of the backslash
> approaches so far:
> 
>   x=''; printf "%s\n" "${x#''}"
> 
> This should print a blank line. (bash, ksh, posh and zsh agree.)
> 
> Here, dash's parser stores '$\$\', where $ is a control character. preglob
> would need to turn this into . The problem is again that preglob
> cannot increase the string length. Perhaps the parser needs to store this as
> '$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it
> requires some more invasive changes.

These are different issues.  dash's parser currently does not
understand nested quoting in patterns at all.  That is, if your
parameter expansion are within double quotes, then dash at the
parser level will consider the pattern to be double-quoted.  Thus
any nested single-quotes will be literals instead of actual quotes.

If we fix this in the parser then everything should just work.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-01 Thread Harald van Dijk

On 01/03/2018 00:04, Harald van Dijk wrote:

$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the 
pattern is considered unquoted regardless of the outer quotation marks. 
Because of that, the single quote characters should not be taken 
literally, but should be taken as quoting the y. ksh, posh and zsh agree 
with bash.


Unfortunately, this causes another problem with all of the backslash 
approaches so far:


  x=''; printf "%s\n" "${x#''}"

This should print a blank line. (bash, ksh, posh and zsh agree.)

Here, dash's parser stores '$\$\', where $ is a control character. 
preglob would need to turn this into . The problem is again that 
preglob cannot increase the string length. Perhaps the parser needs to 
store this as '$\$\$\$\', $ being either CTLESC or that new CTLBACK? 
Either way, it requires some more invasive changes.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-28 Thread Harald van Dijk

On 2/28/18 11:14 PM, Denys Vlasenko wrote:

Guys, while you work on fixing this, consider taking a look at
some more parsing cases in this bbox bug:

https://bugs.busybox.net/show_bug.cgi?id=10821

I suppose some of them may fail in dash too (I did not test, sorry).


$'...' isn't supported in dash, but the underlying problem does appear 
to exist in dash too:


$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the 
pattern is considered unquoted regardless of the outer quotation marks. 
Because of that, the single quote characters should not be taken 
literally, but should be taken as quoting the y. ksh, posh and zsh agree 
with bash.


But: POSIX does not say the same for +/-/..., so dash is correct there:

$ bash -c 'x=yz; echo "${x+'"'y'"'}"'
'y'

$ dash -c 'x=yz; echo "${x+'"'y'"'}"'
'y'

Because of that, for that report's follow-up comment about

  (unset x; echo "${x-$'\x41'}")

I would not have expected this to be taken as a $'...' string. ksh 
(which does support $'...' strings too) prints the literal text $'\x41', 
and so does bash if invoked as sh.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-28 Thread Denys Vlasenko
On Sat, Feb 24, 2018 at 5:52 PM, Herbert Xu  wrote:
> On Sat, Feb 24, 2018 at 10:47:07AM +0100, Harald van Dijk wrote:
>>
>> It seems like the new control character doesn't get escaped in one place
>> where it should be, and gets lost because of that:
>>
>> Unpatched:
>>
>> $ dash -c 'x=`printf \\\211X`; echo $x | cat -v'
>> M-^IX
>>
>> Patched:
>>
>> $ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v'
>> X
>
> Hmm, it works here.  Can you check that your Makefile is up-to-date
> and the generated syntax.c looks like this:
>
> const char basesyntax[] = {
>   CEOF,CSPCL,   CWORD,   CCTL,
>   CCTL,CCTL,CCTL,CCTL,
>   CCTL,CCTL,CCTL,CCTL,
>
> Cheers,

Guys, while you work on fixing this, consider taking a look at
some more parsing cases in this bbox bug:

https://bugs.busybox.net/show_bug.cgi?id=10821

I suppose some of them may fail in dash too (I did not test, sorry).
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-26 Thread Harald van Dijk

On 26/02/2018 09:40, Harald van Dijk wrote:

On 26/02/2018 08:03, Harald van Dijk wrote:

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


And hopefully I've got a good example now:

$ touch '\www'
$ src/dash -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
\*a
a
\www

In both $z and ${x#$z} / ${y#$z}, $z is unquoted. In $z, it's 
interpreted as a literal backslash followed by a * wildcard. In ${x#$z}, 
it's interpreted as an escaped * character. I don't understand why these 
cases should be treated differently.


ksh agrees with dash.

bash also behaves strangely here. In both cases, it treats the * as 
escaped by the preceding backslash, but in $z, the backslash is 
preserved, whereas in ${x#$z} / ${y#$z}, the backslash is removed:


$ bash -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
\*a
a
\*

posh is more consistent: in both $z and in ${x#$z} / ${y#$z}, $z is 
treated as a literal backslash followed by a * wildcard:


$ posh -c 'x=\\*a y=*a z=\\*; printf "%s\n" ${x#$z} ${y#$z} $z'
*a
*a
\www

I'm having trouble telling from the spec what the expected behaviour is 
here, and because of that, whether dash's behaviour is already correct 
or needs to be modified.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-26 Thread Harald van Dijk

On 26/02/2018 08:03, Harald van Dijk wrote:

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


$ dash -c 'a=\\a; echo "${a#\*}"'

$ bash -c 'a=\\a; echo "${a#\*}"'
\a

My patch and Herbert's preserve dash's current behaviour, your patch 
makes it print a. None of that is correct, the result should be the same 
as bash.


Never mind, this is a very bad test. I forgot about echo's backslash 
handling. Had my terminal emulator given any feedback on \a, I might 
have noticed that before sending. This test doesn't show anything wrong 
in this area, but I'll check a bit more.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-25 Thread Harald van Dijk

On 2/13/18 2:53 PM, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


There's another case where this goes wrong, that isn't fixed by your, 
my, or Herbert's patches:


$ dash -c 'a=\\a; echo "${a#\*}"'

$ bash -c 'a=\\a; echo "${a#\*}"'
\a

My patch and Herbert's preserve dash's current behaviour, your patch 
makes it print a. None of that is correct, the result should be the same 
as bash.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-24 Thread Harald van Dijk

On 2/24/18 5:52 PM, Herbert Xu wrote:

On Sat, Feb 24, 2018 at 10:47:07AM +0100, Harald van Dijk wrote:


It seems like the new control character doesn't get escaped in one place
where it should be, and gets lost because of that:

Unpatched:

$ dash -c 'x=`printf \\\211X`; echo $x | cat -v'
M-^IX

Patched:

$ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v'
X


Hmm, it works here.  Can you check that your Makefile is up-to-date


It was.


and the generated syntax.c looks like this:

const char basesyntax[] = {
   CEOF,CSPCL,   CWORD,   CCTL,
   CCTL,CCTL,CCTL,CCTL,
   CCTL,CCTL,CCTL,CCTL,


Thanks, that was what was wrong. Although mksyntax did get re-executed 
because of your Makefile change, re-executing it didn't help: mksyntax 
doesn't use parser.h at run time, only at build time. So I think the 
Makefile.am change should instead be:


--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -66,7 +66,7 @@ syntax.c syntax.h: mksyntax
 signames.c: mksignames
./$^

-mksyntax: token.h
+mksyntax: parser.h token.h

 $(HELPERS): %: %.c
$(COMPILE_FOR_BUILD) -o $@ $<

Cheers,
Harald van Dijk

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-24 Thread Herbert Xu
On Sat, Feb 24, 2018 at 10:47:07AM +0100, Harald van Dijk wrote:
>
> It seems like the new control character doesn't get escaped in one place
> where it should be, and gets lost because of that:
> 
> Unpatched:
> 
> $ dash -c 'x=`printf \\\211X`; echo $x | cat -v'
> M-^IX
> 
> Patched:
> 
> $ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v'
> X

Hmm, it works here.  Can you check that your Makefile is up-to-date
and the generated syntax.c looks like this:

const char basesyntax[] = {
  CEOF,CSPCL,   CWORD,   CCTL,
  CCTL,CCTL,CCTL,CCTL,
  CCTL,CCTL,CCTL,CCTL,

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-24 Thread Harald van Dijk

On 2/24/18 1:33 AM, Herbert Xu wrote:

On Wed, Feb 21, 2018 at 11:47:58PM +0100, Harald van Dijk wrote:

On 2/21/18 2:50 PM, Denys Vlasenko wrote:

I propose replacing this:


Agreed, that looks better. Thanks!


Good work guys.  However, could you check to see if this patch
works too? It passes all my tests so far.


It seems like the new control character doesn't get escaped in one place 
where it should be, and gets lost because of that:


Unpatched:

$ dash -c 'x=`printf \\\211X`; echo $x | cat -v'
M-^IX

Patched:

$ src/dash -c 'x=`printf \\\211X`; echo $x | cat -v'
X

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-23 Thread Herbert Xu
On Wed, Feb 21, 2018 at 11:47:58PM +0100, Harald van Dijk wrote:
> On 2/21/18 2:50 PM, Denys Vlasenko wrote:
> >I propose replacing this:
> 
> Agreed, that looks better. Thanks!

Good work guys.  However, could you check to see if this patch
works too? It passes all my tests so far.

---8<---
The commit "[EXPAND] Move parse-time quote flag detection to
run-time" broke the following case:

case \\ in "\*") echo bad;; esac

The reason is that the naked backslash gets attached to the newly
added backslash added for the special character "*" in preglob,
IOW you end up with "\\*" instead of just "\*" as it should be.

The reason the naked backslash exists in the first place is because
otherwise we cannot tell the difference between "\[" and "\\[" when
it occurs in a quoted pattern context such as "${var#\[}"

This patch restores the original behaviour for the buggy case
while keeping the naked backslash for the quoted pattern case.

Fixes: 7cfd8be0dc83 ("[EXPAND] Move parse-time quote flag...")
Reported-by: Denys Vlasenko 
Suggested-by: Harald van Dijk 
Signed-off-by: Herbert Xu 

diff --git a/src/Makefile.am b/src/Makefile.am
index 139355e..b5bff06 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -60,7 +60,7 @@ init.c: mkinit $(dash_CFILES)
 nodes.c nodes.h: mknodes nodetypes nodes.c.pat
./$^
 
-syntax.c syntax.h: mksyntax
+syntax.c syntax.h: mksyntax parser.h
./$^
 
 signames.c: mksignames
diff --git a/src/expand.c b/src/expand.c
index 2a50830..aeffe82 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -244,6 +244,7 @@ argstr(char *p, int flag)
CTLVAR,
CTLBACKQ,
CTLENDARI,
+   CTLBACK,
0
};
const char *reject = spclchars;
@@ -330,6 +331,7 @@ addquote:
startloc++;
}
break;
+   case CTLBACK:
case CTLESC:
startloc++;
length++;
@@ -340,7 +342,7 @@ addquote:
 * backslash.
 */
if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-   EXP_QPAT && *p != '\\')
+   EXP_QPAT && (*p != '\\' || c == CTLBACK))
break;
 
goto addquote;
@@ -373,6 +375,7 @@ exptilde(char *startp, char *p, int flag)
 
while ((c = *++p) != '\0') {
switch(c) {
+   case CTLBACK:
case CTLESC:
return (startp);
case CTLQUOTEMARK:
@@ -594,7 +597,7 @@ scanleft(
*loc2 = c;
if (match)
return loc;
-   if (quotes && *loc == (char)CTLESC)
+   if (quotes && (*loc == (char)CTLESC || *loc == (char)CTLBACK))
loc++;
loc++;
loc2++;
@@ -821,7 +824,8 @@ end:
if (subtype != VSNORMAL) {  /* skip to end of alternative */
int nesting = 1;
for (;;) {
-   if ((c = (signed char)*p++) == CTLESC)
+   if ((c = (signed char)*p++) == CTLESC ||
+   c == CTLBACK)
p++;
else if (c == CTLBACKQ) {
if (varlen >= 0)
@@ -1052,7 +1056,7 @@ ifsbreakup(char *string, int maxargs, struct arglist 
*arglist)
 
q = p;
c = *p++;
-   if (c == (char)CTLESC)
+   if (c == (char)CTLESC || c == (char)CTLBACK)
c = *p++;
 
isifs = strchr(ifs, c);
@@ -1684,7 +1688,7 @@ _rmescapes(char *str, int flag)
notescaped = globbing;
continue;
}
-   if (*p == (char)CTLESC) {
+   if (*p == (char)CTLESC || *p == (char)CTLBACK) {
p++;
if (notescaped)
*q++ = '\\';
diff --git a/src/jobs.c b/src/jobs.c
index 4f02e38..a3aef20 100644
--- a/src/jobs.c
+++ b/src/jobs.c
@@ -1386,6 +1386,7 @@ cmdputs(const char *s)
while ((c = *p++) != 0) {
str = 0;
switch (c) {
+   case CTLBACK:
case CTLESC:
c = *p++;
break;
diff --git a/src/mystring.c b/src/mystring.c
index 0106bd2..2573d1f 100644
--- a/src/mystring.c
+++ b/src/mystring.c
@@ -62,7 +62,7 @@ const char spcstr[] = " ";
 const char snlfmt[] = "%s\n";
 const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL, '@', '=',
  CTLQUOTEMARK, '\0' };
-const char 

Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-21 Thread Harald van Dijk

On 2/21/18 2:50 PM, Denys Vlasenko wrote:

I propose replacing this:


Agreed, that looks better. Thanks!

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-21 Thread Denys Vlasenko
On Mon, Feb 19, 2018 at 11:13 PM, Harald van Dijk  wrote:
> On 2/18/18 11:50 PM, Harald van Dijk wrote:
>> On 2/14/18 11:50 PM, Harald van Dijk wrote:
>>> On 2/14/18 10:44 PM, Harald van Dijk wrote:
 On 2/14/18 9:03 PM, Harald van Dijk wrote:
> On 13/02/2018 14:53, Denys Vlasenko wrote:
>> $ >'\'
>> $ >'\'
>> $ dash -c 'echo "\*"'
>> \ \
>
>
> [...]
>
> Currently:
>
> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
> <>
>
> This is what I expect, and also what bash, ksh and posh do.
>
> With your patch:
>
> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
> 

 Does the attached look right as an alternative? It treats a quoted
 backslash the same way as if it were preceded by CTLESC in _rmescapes. It
 passes your test case and mine, but I'll do more extensive testing.
>>>
>>> It causes preglob's string to potentially grow larger than the original.
>>> When called with RMESCAPE_ALLOC, that can be handled by increasing the
>>> buffer size, but preglob also gets called without RMESCAPE_ALLOC to modify a
>>> string in-place. That's never going to work with this approach. Back to the
>>> drawing board...
>>
>> There is a way to make it work: ensure sufficient memory is always
>> available. Instead of inserting CTLESC, which caused problems,
>> CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a no-op
>> here.
>
> It required one obvious additional trivial change to the CHECKSTRSPACE
> invocation, but with that added, the attached passed all testing I could
> think of. Does this look okay to include, did I miss something, or is there
> perhaps a better alternative?

I propose replacing this:

if (*p == (char)CTLESC) {
p++;
+   goto escape;
+   } else if (*p == '\\') {
+   if (inquotes) {
+escape:
+   if (notescaped)
+   *q++ = '\\';
+   } else {
+   /* naked back slash */
+   notescaped = 0;
+   goto copy;
+   }

with equivalent

if (*p == (char)CTLESC) {
p++;
+   goto escape;
+   }
+   if (*p == '\\') {
+   if (!inquotes) {
+   /* naked back slash */
+   notescaped = 0;
+   goto copy;
+   }
+escape:
+   if (notescaped)
+   *q++ = '\\';
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-19 Thread Harald van Dijk

On 2/18/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 10:44 PM, Harald van Dijk wrote:

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in 
_rmescapes. It passes your test case and mine, but I'll do more 
extensive testing.


It causes preglob's string to potentially grow larger than the 
original. When called with RMESCAPE_ALLOC, that can be handled by 
increasing the buffer size, but preglob also gets called without 
RMESCAPE_ALLOC to modify a string in-place. That's never going to work 
with this approach. Back to the drawing board...


There is a way to make it work: ensure sufficient memory is always 
available. Instead of inserting CTLESC, which caused problems, 
CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a 
no-op here.


It required one obvious additional trivial change to the CHECKSTRSPACE 
invocation, but with that added, the attached passed all testing I could 
think of. Does this look okay to include, did I miss something, or is 
there perhaps a better alternative?


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+if (notescaped)
+	*q++ = '\\';
+			} else {
+/* naked back slash */
+notescaped = 0;
+goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/parser.c b/src/parser.c
index 382658e..a847b2e 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -909,7 +909,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 #endif
 		CHECKEND();	/* set c to PEOF if at end of here document */
 		for (;;) {	/* until end of line or end of word */
-			CHECKSTRSPACE(4, out);	/* permit 4 calls to USTPUTC */
+			CHECKSTRSPACE(5, out);	/* permit 5 calls to USTPUTC */
 			switch(syntax[c]) {
 			case CNL:	/* '\n' */
 if (syntax == BASESYNTAX)
@@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			eofmark != NULL
 		)
 	) {
+		/* Reserve extra memory in case this backslash will require later escaping. */
+		USTPUTC(CTLQUOTEMARK, out);
+		USTPUTC(CTLQUOTEMARK, out);
 		USTPUTC('\\', out);
 	}
 	USTPUTC(CTLESC, out);


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-18 Thread Harald van Dijk

On 2/14/18 11:50 PM, Harald van Dijk wrote:

On 2/14/18 10:44 PM, Harald van Dijk wrote:

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in _rmescapes. 
It passes your test case and mine, but I'll do more extensive testing.


It causes preglob's string to potentially grow larger than the original. 
When called with RMESCAPE_ALLOC, that can be handled by increasing the 
buffer size, but preglob also gets called without RMESCAPE_ALLOC to 
modify a string in-place. That's never going to work with this approach. 
Back to the drawing board...


There is a way to make it work: ensure sufficient memory is always 
available. Instead of inserting CTLESC, which caused problems, 
CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a 
no-op here. I'm currently testing the attached.


To be honest, FreeBSD sh's approach, keeping a syntax stack to detect 
characters' meaning reliably at parse time, feels more elegant to me 
right now, but that requires invasive and therefore risky changes to 
dash's code.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+if (notescaped)
+	*q++ = '\\';
+			} else {
+/* naked back slash */
+notescaped = 0;
+goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/parser.c b/src/parser.c
index 382658e..bb16a46 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -944,6 +944,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			eofmark != NULL
 		)
 	) {
+		/* Reserve extra memory in case this backslash will require later escaping. */
+		USTPUTC(CTLQUOTEMARK, out);
+		USTPUTC(CTLQUOTEMARK, out);
 		USTPUTC('\\', out);
 	}
 	USTPUTC(CTLESC, out);


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-14 Thread Harald van Dijk

On 2/14/18 10:44 PM, Harald van Dijk wrote:

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in _rmescapes. 
It passes your test case and mine, but I'll do more extensive testing.


It causes preglob's string to potentially grow larger than the original. 
When called with RMESCAPE_ALLOC, that can be handled by increasing the 
buffer size, but preglob also gets called without RMESCAPE_ALLOC to 
modify a string in-place. That's never going to work with this approach. 
Back to the drawing board...


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-14 Thread Harald van Dijk

On 2/14/18 9:03 PM, Harald van Dijk wrote:

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


[...]

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'



Does the attached look right as an alternative? It treats a quoted 
backslash the same way as if it were preceded by CTLESC in _rmescapes. 
It passes your test case and mine, but I'll do more extensive testing.


Cheers,
Harald van Dijk
diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@ _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+if (notescaped)
+	*q++ = '\\';
+			} else {
+/* naked back slash */
+notescaped = 0;
+goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-14 Thread Harald van Dijk

On 13/02/2018 14:53, Denys Vlasenko wrote:

$ >'\'
$ >'\'
$ dash -c 'echo "\*"'
\ \


Nice catch.


The cause: uses "\\*" pattern instead of "\\\*".
The fix:

 /* backslash */
 case CBACK:
 c = pgetc2();
 if (c == PEOF) {
 USTPUTC(CTLESC, out);
 USTPUTC('\\', out);
 pungetc();
 } else if (c == '\n') {
 nlprompt();
 } else {
 if (
 dblquote &&
 c != '\\' && c != '`' &&
 c != '$' && (
 c != '"' ||
 eofmark != NULL
 )
 ) {
USTPUTC(CTLESC, out); // add this line
 USTPUTC('\\', out);
 }
 USTPUTC(CTLESC, out);
 USTPUTC(c, out);
 quotef++;
 }


I don't think this is right. The bug was introduced in




Prior to that, the line USTPUTC(CTLESC, out); was there. The commit 
message is saying that the logic for detecting whether \ should be taken 
literally doesn't belong in the parser, that the parser will get it 
wrong. The example in the commit message doesn't break with your patch, 
but short modifications to that example do make it fail:


Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-02-13 Thread Martijn Dekker
Op 13-02-18 om 14:53 schreef Denys Vlasenko:
> $ >'\'
> $ >'\'
> $ dash -c 'echo "\*"'
> \ \
> 
> The cause: uses "\\*" pattern instead of "\\\*".

Also:

$ dash -c 'case \\ab in "\*") echo BUG;; esac'
BUG
$ dash -c 'case \\a in "\?") echo BUG;; esac'
BUG

Yup. Full globbing within double quotes after a backslash.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html