Expansion-lookalikes in heredoc delimiters

2018-03-08 Thread Harald van Dijk

Consider this:

  cat <<`bad`
  `bad`

As far as I can tell, this is technically valid, supposed to print 
nothing, and accepted in most other shells.


POSIX Token Recognition says `bad` is to be recognised as a single token 
of type word, and any word can act as a heredoc delimiter. POSIX 
Here-Document says only quote removal is performed on the word.


However, dash does not always preserve the original spelling of the 
word. That's what's going on here. Because dash hasn't preserved the 
`bad` spelling (it's been turned into CTLBACKQ), the check for the end 
of the heredoc doesn't pick it up, and it instead gets taken as a 
command substitution.


When an actual 0x84 byte is seen in the input, *that* gets taken as the 
heredoc delimiter:


  dash -c "tail -n1 <<\`:\`
  `printf \\\204`
  ok
  \`:\`"

In a locale in which 0x84 is a valid character (since dash doesn't 
support locales, that's easy, it's always valid), it's supposed to print 
"ok". dash instead interprets the second line as the end of the heredoc 
and subsequently issues an error message when it interprets "ok" on line 
3 as a command to execute.


This is pretty clearly a case that no serious script is ever going to 
encounter, not to mention one that many shells don't even attempt to 
support, at least not completely, so I don't think this is a real 
problem. I'm mentioning it anyway because I was trying to come up with a 
few more test cases for the parser, and I think it's good to have a 
record not only of what worked, what has been made to work, and what got 
broken, but also of what's never going to be work.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] parser: Add syntax stack for recursive parsing

2018-03-08 Thread Herbert Xu
On Thu, Mar 08, 2018 at 12:36:04PM +0100, Harald van Dijk wrote:
>
> The first line of this part of my patch is about something else:
> 
>   x=\"; cat <   ${x#"\""}
>   EOF
> 
> This shouldn't print anything.

Right.  So here is a new patch with that change thrown in.

---8<---
Without a stack of syntaxes we cannot correctly these two cases
together:

"${a#'$$'}"
"${a#"${b-'$$'}"}"

A recursive parser also helps in some other corner cases such
as nested arithmetic expansion with paratheses.

This patch adds a syntax stack allocated from the stack using
alloca.  As a side-effect this allows us to remove the naked
backslashes for patterns within double-quotes, which means that
EXP_QPAT also has to go.

This patch also fixes removes any backslashes that precede right
braces when they are present within a parameter expansion context,
and backslashes that precede double quotes within inner double
quotes inside a parameter expansion in a here-document context.

The idea of a recursive parser is based on a patch by Harald van
Dijk.

Signed-off-by: Herbert Xu 

diff --git a/src/expand.c b/src/expand.c
index 2a50830..903e250 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -83,7 +83,7 @@
 #define RMESCAPE_HEAP  0x10/* Malloc strings instead of stalloc */
 
 /* Add CTLESC when necessary. */
-#define QUOTES_ESC (EXP_FULL | EXP_CASE | EXP_QPAT)
+#define QUOTES_ESC (EXP_FULL | EXP_CASE)
 /* Do not skip NUL characters. */
 #define QUOTES_KEEPNUL EXP_TILDE
 
@@ -333,16 +333,6 @@ addquote:
case CTLESC:
startloc++;
length++;
-
-   /*
-* Quoted parameter expansion pattern: remove quote
-* unless inside inner quotes or we have a literal
-* backslash.
-*/
-   if (((flag | inquotes) & (EXP_QPAT | EXP_QUOTED)) ==
-   EXP_QPAT && *p != '\\')
-   break;
-
goto addquote;
case CTLVAR:
p = evalvar(p, flag | inquotes);
@@ -651,8 +641,7 @@ subevalvar(char *p, char *str, int strloc, int subtype, int 
startloc, int varfla
char *(*scan)(char *, char *, char *, char *, int , int);
 
argstr(p, EXP_TILDE | (subtype != VSASSIGN && subtype != VSQUESTION ?
-  (flag & (EXP_QUOTED | EXP_QPAT) ?
-   EXP_QPAT : EXP_CASE) : 0));
+  EXP_CASE : 0));
STPUTC('\0', expdest);
argbackq = saveargbackq;
startp = stackblock() + startloc;
@@ -1644,7 +1633,6 @@ char *
 _rmescapes(char *str, int flag)
 {
char *p, *q, *r;
-   unsigned inquotes;
int notescaped;
int globbing;
 
@@ -1674,24 +1662,23 @@ _rmescapes(char *str, int flag)
q = mempcpy(q, str, len);
}
}
-   inquotes = 0;
globbing = flag & RMESCAPE_GLOB;
notescaped = globbing;
while (*p) {
if (*p == (char)CTLQUOTEMARK) {
-   inquotes = ~inquotes;
p++;
notescaped = globbing;
continue;
}
+   if (*p == '\\') {
+   /* naked back slash */
+   notescaped = 0;
+   goto copy;
+   }
if (*p == (char)CTLESC) {
p++;
if (notescaped)
*q++ = '\\';
-   } else if (*p == '\\' && !inquotes) {
-   /* naked back slash */
-   notescaped = 0;
-   goto copy;
}
notescaped = globbing;
 copy:
diff --git a/src/expand.h b/src/expand.h
index 26dc5b4..90f5328 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -55,7 +55,6 @@ struct arglist {
 #defineEXP_VARTILDE0x4 /* expand tildes in an assignment */
 #defineEXP_REDIR   0x8 /* file glob for a redirection (1 match 
only) */
 #define EXP_CASE   0x10/* keeps quotes around for CASE pattern */
-#define EXP_QPAT   0x20/* pattern in quoted parameter expansion */
 #define EXP_VARTILDE2  0x40/* expand tildes after colons only */
 #define EXP_WORD   0x80/* expand word in parameter expansion */
 #define EXP_QUOTED 0x100   /* expand word in double quotes */
diff --git a/src/parser.c b/src/parser.c
index 382658e..3aeb9f6 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -80,6 +80,18 @@ struct heredoc {
int striptabs;  /* if set, strip leading tabs */
 };
 
+struct synstack {
+   const char *syntax;
+   struct synstack *prev;
+   struct synstack *next;
+   int innerdq;
+   int varpushed;
+   int dblquote;
+   

Re: [PATCH v2] parser: Add syntax stack for recursive parsing

2018-03-08 Thread Herbert Xu
On Thu, Mar 08, 2018 at 10:05:01AM +, Martijn Dekker wrote:
> Op 08-03-18 om 07:52 schreef Herbert Xu:
> > This patch should fix this problem as well as the one Harald identified:
> 
> Tested. It works well on my end.
> 
> Note it leaves one bug Harald's patch fixed:
> 
> $ src/dash -c 'IFS=; set -- $@ $*; echo $#'
> 2
> 
> (expected output: 0)

Right.  This is fixed by the reversion of the run-time bookkeeping
of quote marks.  I haven't decided whether to revert that or not
yet so the fix will depend on that.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-08 Thread Harald van Dijk

On 08/03/2018 08:55, Herbert Xu wrote:

On Thu, Mar 08, 2018 at 01:40:32AM +0100, Harald van Dijk wrote:


If the syntax stack is to be stored on the actual stack, then real recursion
could be used instead, as attached. I tried to avoid recursion for the cases
that unpatched dash already handled properly.


It does look a lot simpler than I expected.  However, this patch
is still 30% bigger than my patch :)


That's a bit unfair: that's mostly due to moved code, not to changed code.

But size-wise, your patch still wins: parser.o does become larger with 
my patch than with yours, largely due to my attempts to only recurse 
when needed.



As real recursion doesn't seem to buy us much I think I'll stick
with the syntax stack for now.


@@ -941,20 +1042,25 @@ readtoken1(int firstc, char const *syntax, char 
*eofmark, int striptabs)
c != '\\' && c != '`' &&
c != '$' && (
c != '"' ||
-   eofmark != NULL
+   (eofmark != NULL && 
!varnest)
+   ) && (
+   c != '}' ||
+   !varnest ||
+   (dqvarnest ? innerdq : 
dblquote)


So this seems to be the only substantial difference between your
patch and mine.  I have looked at the behaviour of other shells
and I think I will stick with my patch's behaviour for now.


The first line of this part of my patch is about something else:

  x=\"; cat <
In general, if there are disagreements between shells and the
standard is not specific enough, I'll pick the approach that
results in simpler code.


Fair enough.

Cheers,
Harald van Dijk


Thanks,


--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html