>>>>> On Mon, 20 Nov 2017, Michał Górny wrote:

> New changes:

> 9d819c9 glep-0074: Disallow filenames containing whitespace
> 4124b2f glep-0074: Explicitly specify UTF-8 encoding
> 7f9bd9f glep-0074: Include suggestions from Daniel Campbell

Here are a few comments (quoting below only the parts of the text
referenced by them):

> The Manifest files use UTF-8 encoding.

I don't understand the purpose of that requirement. The only place
where bytes outside of the ASCII range can occur are names of
distfiles, and these should simply be passed transparently. Otherwise,
you would have to reject any sequence of non-ASCII bytes that doesn't
form a valid UTF-8 sequence, which looks like an arbitrary restriction
to me.

> It is an error for a single file to be matched by multiple entries
> of different semantics, file size or checksum values. It is an error
> to specify another entry for a file matching ``IGNORE``, or one of its
> subdirectories.

What about regular files in a directory (or subdirectory) matched by
IGNORE? Looks like this case is not covered (?).

> All paths specified in the Manifest file must consist of characters
> corresponding to valid UTF-8 code points excluding the NULL character
> (``U+0000``) and characters classified as whitespace in the current
> version of the Unicode standard [#UNICODE]_. It is an error to use
> Manifest files in directories containing files whose names contain
> the disallowed characters.

See above. I believe that NUL and ASCII whitespace (i.e. characters 09
0a 0b 0c 0d 20) should be excluded, but excluding byte sequences like
"e1 9a 80" (which is the UTF-8 encoding for U+1680 "OGHAM SPACE MARK")
doesn't make sense.

> During the verification process, the client should compare the timestamp
> against the update time obtained from a local clock or a trusted time
> source. If the comparison result indicates that the Manifest at the time
> of receiving was already significantly outdated, the client should
> either fail the verification or require manual confirmation from user.

s/from user./from the user./

> ``TIMESTAMP <iso8601>``
>   Specifies a timestamp of when the Manifest file was last updated.
>   The timestamp must be a valid second-precision ISO8601 extended format

s/ISO8601/ISO 8601/

> ``IGNORE <path>``
>   Ignores a subdirectory or file from Manifest checks. If the specified
>   path is present, it and its contents are omitted from the Manifest
>   verification (always pass). *Path* must be a plain file or directory
>   path without a trailing slash, and must not contain wildcards.

What does that mean? Wildcards are not special (so "foo*" will match
literally), or wildcard characters like "*" are not allowed at all?

> ``AUX <filename> <size> <checksums>...``
>   Equivalent to the ``DATA`` type, except that the filename is relative
>   to ``files/`` subdirectory.

s/to/to the/

> 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
>    files according to `file verification`_ section, and include their

s/according to/according to the/

> 6. Verify the entries in *covered* set for incompatible duplicates

s/in *covered* set/in the *covered* set/

> 7. Verify all the files in the union of the *present* and *covered*
>    sets, according to `file verification`_ section.

s/to/to the/

>    a. If a ``IGNORE`` entry in the ``Manifest`` file covers
>       the *original* directory (or one of the parent directories), stop.

s/a ``IGNORE`` entry/an ``IGNORE`` entry/

> An example top-level Manifest file for the Gentoo repository would have
> the following content::

>     TIMESTAMP 2017-10-30T10:11:12Z
>     IGNORE distfiles
>     IGNORE local
>     IGNORE lost+found
>     IGNORE packages
>     MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
>     ...
>     MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
>     ...

> An example modern Manifest (disregarding backwards compatibility)
> for a package directory would have the following content::

>     DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
>     DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
>     DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
>     DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
>     DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
>     DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
>     DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..

Update hashes to BLAKE2B SHA512?

> This specification aims to avoid arbitrary restrictions. For this
> reason, the filename characters are only restricted by excluding two

s/the filename characters/filename characters/

> technically problematic groups:

> 1. The NULL character (``U+0000``) is normally used to indicate the end
>    of a null-terminated string. Its use could therefore break programs
>    written using C. Furthermore, it is not allowed in any known
>    filesystem.

> 2. The whitespace characters are used to separate Manifest fields. While

s/The whitespace characters/Whitespace characters/

> 2. being able to run update automatically generated files locally
>    without causing unnecessary verification failures.

Strike the word "run"?

> Strictly speaking, this information is already provided by the various
> ``metadata/timestamp*`` files that are already present. However,

Twice "already" in this sentence.

> The OpenPGP cleartext signature covers the contents of the Manifest,
> and is therefore compressed along with them. The possibility of using
> detached signature has been considered but it was rejected as

s/detached signature/a detached signature/

> The existence of additional entries for uncompressed Manifest checksums
> was debated. However, plain entries for the uncompressed file would
> be confusing if only the compressed file existed, and conflicting
> if both uncompressed and compressed variants existed. Furthermore,
> it has been pointed out that ``DIST`` entries do not have uncompressed
> variant either.

s/uncompressed variant/an uncompressed variant/

> .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
>    at the time of writing are duplicate, representing a 2 MiB
>    out of 25 MiB of DIST entries altogether.

s/a 2 MiB/2 MiB/

> Copyright
> =========

There should be two blank lines before this section heading (as
required by GLEP 2).


Attachment: pgpxW0sYl38P_.pgp
Description: PGP signature

Reply via email to