>>>>> On Mon, 20 Nov 2017, Ulrich Mueller wrote: >>>>> On Mon, 20 Nov 2017, Michał Górny wrote: >> All paths specified in the Manifest file must consist of characters >> corresponding to valid UTF-8 code points excluding the NULL character >> (``U+0000``) and characters classified as whitespace in the current >> version of the Unicode standard [#UNICODE]_. It is an error to use >> Manifest files in directories containing files whose names contain >> the disallowed characters.
> See above. I believe that NUL and ASCII whitespace (i.e. characters > 09 0a 0b 0c 0d 20) should be excluded, but excluding byte sequences > like "e1 9a 80" (which is the UTF-8 encoding for U+1680 "OGHAM SPACE > MARK") doesn't make sense. Thinking about it, this still looks too complicated. So, exclude only SPACE (0x20) which is used as separator between fields. (NUL can be excluded too, but it won't occur anyway.) In fact, all Manifest files in the tree are ASCII only. So alternatively, filenames could be restricted to printable ASCII. This is also what GLEP 31  says: | Suitable Characters for File and Directory Names | | Characters outside the ASCII 0..127 range cannot safely be used for | file or directory names. (Of course, not all characters inside the | ASCII 0..127 range can be used safely either.) Ulrich  Character Sets for Portage Tree Items https://www.gentoo.org/glep/glep-0031.html
Description: PGP signature