Kang-Che Sung wrote in
 <caddzafmvspppxipdbw-gj+ut+ce8xr7i-ffz7ahfgfxk5qh...@mail.gmail.com>:
 |Steffen Nurpmeso <[email protected]> 於 2024年7月3日 星期三寫道:
 |> Kang-Che Sung wrote in
 |>|When it comes to unusual filenames, the GNU way of doing it is
 |implementing
 |>|a `--null` option that accepts the list of filenames separated by ASCII
 |NUL
 |>|characters.
 |>|
 |>|Various other utilities can print the filename list with NUL as the
 |>|separator. For example `-print0` command in `find(1)`.
 |>
 |> This (at least, too lazy to look) is also part of the new POSIX
 |> standard released in June.  Ie, going that NUL thing seems "to
 |> come", it *could* be that there are other issues lying around for
 |> the next standard.
 |>
 |>   ...
 |>
 |> (Nonetheless quoting in the shell language is a must
 |>
 |>   80092          The application shall quote the following characters if
 |they are to represent themselves:
 |>   80093          |    &     ;    <    >   (    )    $    `    \     "
 |'    <space>         <tab>          <newline>
 |>
 |> and POSIX 2024 adds the $'' dollar single quote mechanism (dash is
 |> about to implement it / has just recently done so), and for tools
 |> producing output for the (interaction with the) shell that thus
 |> seems useful to have; i do not know how portable "IFS= xy" is..)
 |>
 |
 |Just FYI, there is a portable alternative to the $'' (dollar-single-quote)
 |of passing special characters in the shell. It's $(printf '...') with
 |command substitution.

You mean the %q format?  That is not standardized.

   %q     ARGUMENT is printed in a format that can be reused as  shell  in-
          put,  escaping  non-printable  characters with the proposed POSIX
          $'' syntax.

Just like bash(1)s ${parameter@operator}:

    Q      The expansion is a string that is the value  of  parameter
           quoted in a format that can be reused as input.

 |It is useful if the special characters are known ahead of time, and it's
 |not a complete substitute of `ls --quoting-style=shell` nor `ls --zero`.

Hm.

 |I'm not sure what the use case of the original reporter (Ian Norton) is,
 |but it's simply not part of the goal for `tar -tf foo.tar` to output or
 |escape special characters in filenames.
 |
 |In other words, there's no bug here, just a UX inconvenience that special
 |characters are not displayed properly.
 |
 |* If you want a secure protocol for outputting filenames or accepting
 |filenames in tar(1) and other utilities, then the `--null` option is the
 |way to go. Human readability of the filenames is second for this use case.

Well one could look for isatty(3) for example.
Things are easier if you also know you are in a Unicode-aware
environment, then you can simply add U+2400 aka do

     if(!iswprint(wc) && wc != '\n' /*&& wc != '\r' && wc != '\b'*/ &&
           wc != '\t'){
        if ((wc & ~S(wchar_t,037)) == 0)
           wc = isuni ? 0x2400 | wc : '?';
        else if(wc == 0177)
           wc = isuni ? 0x2421 : '?';
        else
           wc = isuni ? 0x2426 : '?';

but in other cases have to be aware of L-TO-R and R-TO-R marks,
zero width and non-characters, ie most brutal (where isuni tells
us that the character set aka wchar_t is real Unicode).

       }else if(isuni){ /* TODO ctext */
          /* Need to filter out L-TO-R and R-TO-R marks TODO ctext */
          if(wc == 0x200E || wc == 0x200F || (wc >= 0x202A && wc <= 0x202E))
             continue;
          /* And some zero-width messes */
          if(wc == 0x00AD || (wc >= 0x200B && wc <= 0x200D))
             continue;
          /* Oh about the ISO C wide character interfaces, baby! */
          if(wc == 0xFEFF)
             continue;
       }

 |* If you want outputting filenames with human readability and all special
 |characters escaped, then GNU tar has the `--quoting-style` option that
 |busybox can consider implementing too, but keep in mind that this is meant
 |for _output_ only, not for secure _input_ of filenames. (Besides, I don't
 |know if it would escape problematic Unicode control characters. There was a
 |Unicode Bidi vulnerability nicknamed "Trojan Source" that you might be
 |interested in knowing.)

Yes.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to