W dniu pon, 20.11.2017 o godzinie 22∶37 +0100, użytkownik Ulrich Mueller napisał: > > > > > > On Mon, 20 Nov 2017, Michał Górny wrote: > > New changes: > > 9d819c9 glep-0074: Disallow filenames containing whitespace > > 4124b2f glep-0074: Explicitly specify UTF-8 encoding > > 7f9bd9f glep-0074: Include suggestions from Daniel Campbell > > Here are a few comments (quoting below only the parts of the text > referenced by them): > > > The Manifest files use UTF-8 encoding. > > I don't understand the purpose of that requirement. The only place > where bytes outside of the ASCII range can occur are names of > distfiles, and these should simply be passed transparently. Otherwise, > you would have to reject any sequence of non-ASCII bytes that doesn't > form a valid UTF-8 sequence, which looks like an arbitrary restriction > to me.
Let me reply in parts. Why not plain ASCII? Because the spec tries to avoid entirely arbitrary restrictions, and forcing everyone to use just ASCII entirely counts as such. Why not plain bytestring? Mostly because it's really PITA to work on them in Python. Besides, you can't allow arbitrary bytestring since you still need to apply restrictions making it safe to parse in text context, i.e. forbid 0x20, 0x0A, possibly more. Which makes the definition kinda silly in the end. Not to mention transferring files over systems which can recode filenames but will not recode Manifest contents. Why UTF-8 then? Because it's quite reliable and widely established. It works for most of the people out of the box. Those who use other encodings can usually transcode reliably. It's what we're using in ebuilds and everywhere else wrt GLEP 31, so I don't think we should make Manifests any different. > > It is an error for a single file to be matched by multiple entries > > of different semantics, file size or checksum values. It is an error > > to specify another entry for a file matching ``IGNORE``, or one of its > > subdirectories. > > What about regular files in a directory (or subdirectory) matched by > IGNORE? Looks like this case is not covered (?). Ignored regular files must not have any other (e.g. DATA) entries. Otherwise the expected behavior is unclear -- are we supposed to verify the file or ignore it? > > All paths specified in the Manifest file must consist of characters > > corresponding to valid UTF-8 code points excluding the NULL character > > (``U+0000``) and characters classified as whitespace in the current > > version of the Unicode standard [#UNICODE]_. It is an error to use > > Manifest files in directories containing files whose names contain > > the disallowed characters. > > See above. I believe that NUL and ASCII whitespace (i.e. characters 09 > 0a 0b 0c 0d 20) should be excluded, but excluding byte sequences like > "e1 9a 80" (which is the UTF-8 encoding for U+1680 "OGHAM SPACE MARK") > doesn't make sense. The restriction is meant to be intentionally wider to prevent problems with implementations which e.g. use Python's str.split() or '\S' regular expression character (Portage). When working in Unicode-compliant mode, those can match additional whitespace characters, and I'm rejecting them to be on the safe side. > > During the verification process, the client should compare the timestamp > > against the update time obtained from a local clock or a trusted time > > source. If the comparison result indicates that the Manifest at the time > > of receiving was already significantly outdated, the client should > > either fail the verification or require manual confirmation from user. > > s/from user./from the user./ > > > ``TIMESTAMP <iso8601>`` > > Specifies a timestamp of when the Manifest file was last updated. > > The timestamp must be a valid second-precision ISO8601 extended format > > s/ISO8601/ISO 8601/ Both done. > > > ``IGNORE <path>`` > > Ignores a subdirectory or file from Manifest checks. If the specified > > path is present, it and its contents are omitted from the Manifest > > verification (always pass). *Path* must be a plain file or directory > > path without a trailing slash, and must not contain wildcards. > > What does that mean? Wildcards are not special (so "foo*" will match > literally), or wildcard characters like "*" are not allowed at all? Not special. Will reword to: | Wildcards are not supported and wildcard characters are interpreted | literally. > > > ``AUX <filename> <size> <checksums>...`` > > Equivalent to the ``DATA`` type, except that the filename is relative > > to ``files/`` subdirectory. > > s/to/to the/ > > > 3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest > > files according to `file verification`_ section, and include their > > s/according to/according to the/ > > > 6. Verify the entries in *covered* set for incompatible duplicates > > s/in *covered* set/in the *covered* set/ > > > 7. Verify all the files in the union of the *present* and *covered* > > sets, according to `file verification`_ section. > > s/to/to the/ > > > a. If a ``IGNORE`` entry in the ``Manifest`` file covers > > the *original* directory (or one of the parent directories), stop. > > s/a ``IGNORE`` entry/an ``IGNORE`` entry/ All done. > > > An example top-level Manifest file for the Gentoo repository would have > > the following content:: > > TIMESTAMP 2017-10-30T10:11:12Z > > IGNORE distfiles > > IGNORE local > > IGNORE lost+found > > IGNORE packages > > MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb.. > > ... > > MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915.. > > ... > > An example modern Manifest (disregarding backwards compatibility) > > for a package directory would have the following content:: > > DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d.. > > DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749.. > > DATA metadata.xml 664 SHA256 97c6.. SHA512 1175.. > > DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468.. > > DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919.. > > DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33.. > > DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d.. > > Update hashes to BLAKE2B SHA512? I don't really want to go through the hoop of updating the first two bytes of each hash, and I don't think it'd be nice to replace the key while keeping incorrect value. Even that it does not serve any purpose. > > > This specification aims to avoid arbitrary restrictions. For this > > reason, the filename characters are only restricted by excluding two > > s/the filename characters/filename characters/ > > > technically problematic groups: > > 1. The NULL character (``U+0000``) is normally used to indicate the end > > of a null-terminated string. Its use could therefore break programs > > written using C. Furthermore, it is not allowed in any known > > filesystem. > > 2. The whitespace characters are used to separate Manifest fields. While > > s/The whitespace characters/Whitespace characters/ > > > 2. being able to run update automatically generated files locally > > without causing unnecessary verification failures. > > Strike the word "run"? > > > Strictly speaking, this information is already provided by the various > > ``metadata/timestamp*`` files that are already present. However, > > Twice "already" in this sentence. > > > The OpenPGP cleartext signature covers the contents of the Manifest, > > and is therefore compressed along with them. The possibility of using > > detached signature has been considered but it was rejected as > > s/detached signature/a detached signature/ > > > The existence of additional entries for uncompressed Manifest checksums > > was debated. However, plain entries for the uncompressed file would > > be confusing if only the compressed file existed, and conflicting > > if both uncompressed and compressed variants existed. Furthermore, > > it has been pointed out that ``DIST`` entries do not have uncompressed > > variant either. > > s/uncompressed variant/an uncompressed variant/ > > > .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries > > at the time of writing are duplicate, representing a 2 MiB > > out of 25 MiB of DIST entries altogether. > > s/a 2 MiB/2 MiB/ > > > Copyright > > ========= > > There should be two blank lines before this section heading (as > required by GLEP 2). > > Ulrich All fixed. -- Best regards, Michał Górny