commit: 1b4daf535fc27f6ca28219ca9b71a9b9ab5d775b Author: Michał Górny <mgorny <AT> gentoo <DOT> org> AuthorDate: Thu Nov 23 18:44:54 2017 +0000 Commit: Michał Górny <mgorny <AT> gentoo <DOT> org> CommitDate: Sat Nov 25 20:49:17 2017 +0000 URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=1b4daf53
glep-0074: Make extended filename encoding optional glep-0074.rst | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/glep-0074.rst b/glep-0074.rst index 6db6caa..5270b7a 100644 --- a/glep-0074.rst +++ b/glep-0074.rst @@ -142,8 +142,15 @@ corresponding to valid UTF-8 code points excluding the backwards slash (``\``) and characters classified as control characters and whitespace in the current version of the Unicode standard [#UNICODE]_. -Any of the excluded characters that are present in path must be encoded -using one of the following escape sequences: +The implementation can optionally support extended filename encoding +to support those paths. If the encoding is not supported, +the implementation must reject directories containing any files using +non-compliant names, as well as Manifest files whose filename field +contains such filenames. + +If the encoding is supported, then all of the excluded characters that +are present in path must be encoded using one of the following escape +sequences: - characters in the ``U+0000`` to ``U+007F`` range can be encoded as ``\xHH`` where ``HH`` specifies the zero-padded, hexadecimal @@ -615,6 +622,13 @@ by attempting to locate the size field and take everything before it as filename. This was terribly fragile and even if it worked, it would solve the problem only partially. +To preserve compatibility with the current implementations and given +that all of the listed characters are not allowed for the foreseeable +Gentoo uses, the extended encoding support is optional. If such support +is not provided, the implementation must unconditionally reject any +such files. Ignoring them implicitly would be confusing, and it is +not possible to use them in explicit ``IGNORE`` entries. + The character encoding method provides means to overcome the character restrictions to extend the tool usability beyond immediate Gentoo uses. The backslash escape form based on Python unicode strings is used
