>>>>> On Mon, 20 Nov 2017, Ulrich Mueller wrote:

>>>>> On Mon, 20 Nov 2017, Michał Górny wrote:
>> All paths specified in the Manifest file must consist of characters
>> corresponding to valid UTF-8 code points excluding the NULL character
>> (``U+0000``) and characters classified as whitespace in the current
>> version of the Unicode standard [#UNICODE]_. It is an error to use
>> Manifest files in directories containing files whose names contain
>> the disallowed characters.

> See above. I believe that NUL and ASCII whitespace (i.e. characters
> 09 0a 0b 0c 0d 20) should be excluded, but excluding byte sequences
> like "e1 9a 80" (which is the UTF-8 encoding for U+1680 "OGHAM SPACE
> MARK") doesn't make sense.

Thinking about it, this still looks too complicated. So, exclude only
SPACE (0x20) which is used as separator between fields. (NUL can be
excluded too, but it won't occur anyway.)

In fact, all Manifest files in the tree are ASCII only.
So alternatively, filenames could be restricted to printable ASCII.
This is also what GLEP 31 [1] says:

| Suitable Characters for File and Directory Names
|
| Characters outside the ASCII 0..127 range cannot safely be used for
| file or directory names. (Of course, not all characters inside the
| ASCII 0..127 range can be used safely either.)

Ulrich


[1] Character Sets for Portage Tree Items
    https://www.gentoo.org/glep/glep-0031.html

Attachment: pgpBeq6WPQhpm.pgp
Description: PGP signature

Reply via email to