Expansion-lookalikes in heredoc delimiters
Consider this: cat <<`bad` `bad` As far as I can tell, this is technically valid, supposed to print nothing, and accepted in most other shells. POSIX Token Recognition says `bad` is to be recognised as a single token of type word, and any word can act as a heredoc delimiter. POSIX Here-Document says only quote removal is performed on the word. However, dash does not always preserve the original spelling of the word. That's what's going on here. Because dash hasn't preserved the `bad` spelling (it's been turned into CTLBACKQ), the check for the end of the heredoc doesn't pick it up, and it instead gets taken as a command substitution. When an actual 0x84 byte is seen in the input, *that* gets taken as the heredoc delimiter: dash -c "tail -n1 <<\`:\` `printf \\\204` ok \`:\`" In a locale in which 0x84 is a valid character (since dash doesn't support locales, that's easy, it's always valid), it's supposed to print "ok". dash instead interprets the second line as the end of the heredoc and subsequently issues an error message when it interprets "ok" on line 3 as a command to execute. This is pretty clearly a case that no serious script is ever going to encounter, not to mention one that many shells don't even attempt to support, at least not completely, so I don't think this is a real problem. I'm mentioning it anyway because I was trying to come up with a few more test cases for the parser, and I think it's good to have a record not only of what worked, what has been made to work, and what got broken, but also of what's never going to be work. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] parser: Add syntax stack for recursive parsing
On Thu, Mar 08, 2018 at 12:36:04PM +0100, Harald van Dijk wrote: > > The first line of this part of my patch is about something else: > > x=\"; cat < ${x#"\""} > EOF > > This shouldn't print anything. Right. So here is a new patch with that change thrown in. ---8<--- Without a stack of syntaxes we cannot correctly these two cases together: "${a#'$$'}" "${a#"${b-'$$'}"}" A recursive parser also helps in some other corner cases such as nested arithmetic expansion with paratheses. This patch adds a syntax stack allocated from the stack using alloca. As a side-effect this allows us to remove the naked backslashes for patterns within double-quotes, which means that EXP_QPAT also has to go. This patch also fixes removes any backslashes that precede right braces when they are present within a parameter expansion context, and backslashes that precede double quotes within inner double quotes inside a parameter expansion in a here-document context. The idea of a recursive parser is based on a patch by Harald van Dijk. Signed-off-by: Herbert Xudiff --git a/src/expand.c b/src/expand.c index 2a50830..903e250 100644 --- a/src/expand.c +++ b/src/expand.c @@ -83,7 +83,7 @@ #define RMESCAPE_HEAP 0x10/* Malloc strings instead of stalloc */ /* Add CTLESC when necessary. */ -#define QUOTES_ESC (EXP_FULL | EXP_CASE | EXP_QPAT) +#define QUOTES_ESC (EXP_FULL | EXP_CASE) /* Do not skip NUL characters. */ #define QUOTES_KEEPNUL EXP_TILDE @@ -333,16 +333,6 @@ addquote: case CTLESC: startloc++; length++; - - /* -* Quoted parameter expansion pattern: remove quote -* unless inside inner quotes or we have a literal -* backslash. -*/ - if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) == - EXP_QPAT && *p != '\\') - break; - goto addquote; case CTLVAR: p = evalvar(p, flag | inquotes); @@ -651,8 +641,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int startloc, int varfla char *(*scan)(char *, char *, char *, char *, int , int); argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ? - (flag & (EXP_QUOTED | EXP_QPAT) ? - EXP_QPAT : EXP_CASE) : 0)); + EXP_CASE : 0)); STPUTC('\0', expdest); argbackq = saveargbackq; startp = stackblock() + startloc; @@ -1644,7 +1633,6 @@ char * _rmescapes(char *str, int flag) { char *p, *q, *r; - unsigned inquotes; int notescaped; int globbing; @@ -1674,24 +1662,23 @@ _rmescapes(char *str, int flag) q = mempcpy(q, str, len); } } - inquotes = 0; globbing = flag & RMESCAPE_GLOB; notescaped = globbing; while (*p) { if (*p == (char)CTLQUOTEMARK) { - inquotes = ~inquotes; p++; notescaped = globbing; continue; } + if (*p == '\\') { + /* naked back slash */ + notescaped = 0; + goto copy; + } if (*p == (char)CTLESC) { p++; if (notescaped) *q++ = '\\'; - } else if (*p == '\\' && !inquotes) { - /* naked back slash */ - notescaped = 0; - goto copy; } notescaped = globbing; copy: diff --git a/src/expand.h b/src/expand.h index 26dc5b4..90f5328 100644 --- a/src/expand.h +++ b/src/expand.h @@ -55,7 +55,6 @@ struct arglist { #defineEXP_VARTILDE0x4 /* expand tildes in an assignment */ #defineEXP_REDIR 0x8 /* file glob for a redirection (1 match only) */ #define EXP_CASE 0x10/* keeps quotes around for CASE pattern */ -#define EXP_QPAT 0x20/* pattern in quoted parameter expansion */ #define EXP_VARTILDE2 0x40/* expand tildes after colons only */ #define EXP_WORD 0x80/* expand word in parameter expansion */ #define EXP_QUOTED 0x100 /* expand word in double quotes */ diff --git a/src/parser.c b/src/parser.c index 382658e..3aeb9f6 100644 --- a/src/parser.c +++ b/src/parser.c @@ -80,6 +80,18 @@ struct heredoc { int striptabs; /* if set, strip leading tabs */ }; +struct synstack { + const char *syntax; + struct synstack *prev; + struct synstack *next; + int innerdq; + int varpushed; + int dblquote; +
Re: [PATCH v2] parser: Add syntax stack for recursive parsing
On Thu, Mar 08, 2018 at 10:05:01AM +, Martijn Dekker wrote: > Op 08-03-18 om 07:52 schreef Herbert Xu: > > This patch should fix this problem as well as the one Harald identified: > > Tested. It works well on my end. > > Note it leaves one bug Harald's patch fixed: > > $ src/dash -c 'IFS=; set -- $@ $*; echo $#' > 2 > > (expected output: 0) Right. This is fixed by the reversion of the run-time bookkeeping of quote marks. I haven't decided whether to revert that or not yet so the fix will depend on that. Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 08/03/2018 08:55, Herbert Xu wrote: On Thu, Mar 08, 2018 at 01:40:32AM +0100, Harald van Dijk wrote: If the syntax stack is to be stored on the actual stack, then real recursion could be used instead, as attached. I tried to avoid recursion for the cases that unpatched dash already handled properly. It does look a lot simpler than I expected. However, this patch is still 30% bigger than my patch :) That's a bit unfair: that's mostly due to moved code, not to changed code. But size-wise, your patch still wins: parser.o does become larger with my patch than with yours, largely due to my attempts to only recurse when needed. As real recursion doesn't seem to buy us much I think I'll stick with the syntax stack for now. @@ -941,20 +1042,25 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) c != '\\' && c != '`' && c != '$' && ( c != '"' || - eofmark != NULL + (eofmark != NULL && !varnest) + ) && ( + c != '}' || + !varnest || + (dqvarnest ? innerdq : dblquote) So this seems to be the only substantial difference between your patch and mine. I have looked at the behaviour of other shells and I think I will stick with my patch's behaviour for now. The first line of this part of my patch is about something else: x=\"; cat < In general, if there are disagreements between shells and the standard is not specific enough, I'll pick the approach that results in simpler code. Fair enough. Cheers, Harald van Dijk Thanks, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html