0.braceexp.cpp

Martin Sebor Tue, 19 Feb 2008 12:01:22 -0800

Travis Vitek wrote:


Martin,

Thank you for the additional testcases. They point out a few issues that I
didn't interpret from the description in the Bash Reference Manual
[http://www.gnu.org/software/bash/manual/bashref.html#Brace-Expansion]. Note
that below I refer to paragraphs from this documentation.


In case it wasn't clear from my comments, most of the new test cases
(all those in run_bash_tests()), including the expected output, came
from the Bash test suite.


I do have a few issues with the expectations you've laid out. Comments
follow...


sebor-2 wrote:

+
+    TEST ("foo {1,2} bar", "foo 1 2 bar");
+


This isn't a brace expansion. It is a literal string, followed by a brace
expansion, followed by a literal string. When you run 'echo foo {1,2} bar'
in the shell, each of the args are brace expanded individually, so the only
thing that is brace expanded is the '{1,2}' and everything else is written
literally. I believe this testcase is invalid.


So rw_brace_expand() only handles a single brace expression? I.e.,
a pattern like this:

 pattern  ::= [ <preamble> ] '{' <list> | <seq-exp> '}' [ <postfix> ]
 list     ::= string [ , <list> ] | string
 seq-expr ::= <char> .. <char> | <number> .. <number>

If that's so, what's the definition of <preamble> and <postfix>?

FWIW, the grammar I suggested here http://tinyurl.com/2rs3he allows
multiple brace expressions:

 string     ::= <brace-expr> | [ <chars> ]
 brace-expr ::= <string> '{' <brace-list> '}' <string> | <string>
 brace-list ::= <string> ',' <brace-list> | <string>
 chars      ::= <pcs-char> <string> | <pcs-char>
 pcs-char   ::= character in the Portable Character Set

We don't need to follow it but if we choose not to we should document
what grammar we do follow.



sebor-2 wrote:

+    // we don't have eval
+    // TEST ("`zecho foo {1,2} bar`",  "foo 1 2 bar");
+    // TEST ("$(zecho foo {1,2} bar)", "foo 1 2 bar");


Same problem here.


I left these in place just for completeness but I don't expect us to
ever implement eval. Otherwise, I agree they're the same as the test
case above.



sebor-2 wrote:

+#if 0   // not implemented yet
+
+    // set the three variables
+    rw_putenv ("var=baz:varx=vx:vary=vy");
+
+    TEST ("foo{bar,${var}.}", "foobar foobaz.");
+    TEST ("foo{bar,${var}}",  "foobar foobaz");
+
+    TEST ("${var}\"{x,y}",    "bazx bazy");
+    TEST ("$var{x,y}",        "vx vy");
+    TEST ("${var}{x,y}",      "bazx bazy");
+
+    // unset all three variables
+    rw_putenv ("var=:varx=:vary=");
+
+#endif   // 0


I don't expect this functionality to ever be implemented inside
rw_brace_expand(). As mentioned in paragraph 4, the brace expansion itself
is done before other expansions, and it does not interpret the text between
the braces.

Given this, I feel that the environment variable expansion must done at some
later stage, by some other function, and the above test block is
inappropriate for this test.


Okay. Again, I included them for completeness (I didn't want to just
completely remove some test cases), but if it makes sense to do this
expansion at a later stage in some other function that calls
rw_brace_expand() that's fine with me



sebor-2 wrote:

+
+    TEST ("{1..10}", "1 2 3 4 5 6 7 8 9 10");
+


This is a case that I should be handling. I need to go back and add complete
support for integer ranges, specifically ranges that include multidigit
numbers and sign.


sebor-2 wrote:

+    // this doesn't work in Bash 3.2
+    // TEST ("{0..10,braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
+


I don't know how anyone could expect this to work. The first subexpression
of the brace expansion list is '0..10', which itself is not a brace
expansion, so it should not be expanded. It should be left as a literal.
This happens to be the behavior I see with Bash 3.0.


Yes, but from the comment above the test case in the braces.test file
it looks like they plan to make it work. The comments says: # this
doesn't work yet. I don't think it's an important use case and I have
no problem with not implementing it (for now).



sebor-2 wrote:

+    // but this does
+    TEST ("{{0..10},braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
+    TEST ("x{{0..10},braces}y",
+          "x0y x1y x2y x3y x4y x5y x6y x7y x8y x9y x10y xbracesy");
+


Obviously, both of these are valid versions of the previous test expression.


sebor-2 wrote:

+    TEST ("{a..A}",
+          "a ` _ ^ ]  [ Z Y X W V U T S R Q P O N M L K J I H G F E D C B
A");
+    TEST ("{A..a}",
+          "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [  ] ^ _ `
a");
+


Interesting. I didn't think it would make sense to allow mixing of lower and
uppercase characters in the sequence expression because of the characters
between 'Z' and 'a'. Obviously I was wrong. BTW, any idea what happened to
ASCII 92? It is the backslash character that should appear between '[' and
']'.


No idea. To me this is one of the most dubious of the features since
it assumes ASCII. I wouldn't be at all upset if we didn't implement
it (at least not until we actually need it for something ;-)



sebor-2 wrote:

+
+    TEST ("0{1..9} {10..20}",
+          "01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20");


This has the same problem as the first issue I brought up. This is actually
two seperate brace expansions, the first is '0{1..9}' and the second is
'{10..20}'. This is how the shell handles them, and this is how I handle
them.

If they were treated as one brace expansion by the shell, I would expect the
postscript '{10..20}' expanded for each prefix/body expansion, much like you
would see if you escaped the space.


I realize those are two brace expressions but I don't understand why
the function shouldn't be able to handle them as such. That's what
the expected output assumes, isn't it? I expect this to come up in
our uses of the feature so if rw_brace_expand() doesn't implement
it we'll have to implement it somewhere else. Do you see a problem
with implementing it in rw_brace_expand()?



sebor-2 wrote:

+    // weirdly-formed brace expansions -- fixed in post-bash-3.1
+    TEST ("a-{b{d,e}}-c",    "a-{bd}-c a-{be}-c");


I don't understand how this could be interpreted as valid brace expansion at
all. The body of the expansion is '{b{d,e}}'. Paragraph 5 [and paragraph 1
for that matter] require a correctly-formed brace expansion have unquoted
[unescaped?] opening and closing braces, and at least one unquoted comma or
a valid sequence expression. The body does not meet either of these
requirements, so it must be invalid.

To get the result shown, the obvious thing to do is to escape the outer
braces. This would give us the valid expression 'a-\{b{d,e}\}-c', that
happens to also work with previous versions of bash also.


sebor-2 wrote:

+    TEST ("a-{bdef-{g,i}-c", "a-{bdef-g-c a-{bdef-i-c");


Again, this does not seem correct according to the requirements of paragraph
5 [and 1].


Obviously, these last two are corner cases (as the comment above
them indicates). I don't think we'll ever try to do anything as
bizarre as this. What is interesting about them is the fact that
the Bash maintainers cared enough about them to include them in
their (otherwise pretty small) test suite.


If the body is supposed to be between a pair of braces, shouldn't the first
unescaped opening brace match the first unescaped close brace at the same
brace depth? If it is, then the outer brace expansion isn't valid because it
doesn't have a terminating close brace. Even if one was added, the resulting
expression has the same problem as the previous example. The nested
expression 'bdef-{g,i}-c' isn't a series comma-seperated strings or a

sequence expression.

If you wanted the first brace to be ignored, as it is in the test, then it
should be escaped. Then we would have 'a-\{bdef-{g,i}-c'. That expression
follows the requirements outlined in the manual, and works with old versions
of bash, and a human can pretty easily figure out what the expected result
would be.

Now I suppose that since invalid brace expansions are to be left unchanged,
you could say that the first brace expansion is copied literally because it
is invalid, but the second is valid and should be expanded. This almost
explains how bash 3.2 gets these results, but it still seems wrong. If a
subexpression is invalid it seems that the whole expression is invalid.


sebor-2 wrote:

+    TEST ("{",     "{");
+    TEST ("}",     "}");
+    TEST ("{}",    "{}");
+    TEST ("{ }",   "{ }");
+    TEST ("{  }",  "{  }");   // is this right?


I sure think it is. Again, the requirements say that these are not valid
brace expansions, so they should be left unchanged. I'm wondering if the
shell is doing some sort of whitespace collapse. Everything seems to work
fine if you escape the spaces, so I'm thinking that is why you see the
behavior that you do.


Yes, I believe that's exactly what it does, and I wondered if
rw_brace_expand() should do the same thing. I didn't spend too much
time contemplating whether it makes sense at this level or if it's
just a consequence of the shell tokenization.


So, with all that said, I've got a few thoughts.

1. I don't really like the idea of trying to emulate all behavior of the
shell in rw_brace_expand. If we want that, then we should have made a bug
entitled 'provide a complete implementation of bash'.
2. I don't feel comfortable trying to maintain compatibility with version
3.2 of bash. It doesn't seem to follow the documented requirements, and I
believe that the odd behavior may be difficult to implement. The bash 3.0
implementation seems much more sane and that is what I tried to emulate when
writing this code.
3. If you, er, we want to do brace expansion exactly like you see within
bash, then we should write another function that tokenizes a string on
whitespace and does brace expansion on each token. I was expecting the
caller of rw_brace_expand() to expect the function to do brace expansion,
not complete shell emulation.


Well, I thought since we were implementing a Bash feature the Bash
test cases would be useful. If we make a conscious decision to either
deviate from the Bash behavior or to not implement the features we
don't expect to use that's okay with me as long as we document what
our behavior is. One (IMO easy) way to do it is in in the test suite
in the form of test cases, with an explanation for any differences
with Bash. I tend to use the test cases for the test driver when I
need to know how something works.

Martin

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Reply via email to