W dniu czw, 02.11.2017 o godzinie 23∶43 +0000, użytkownik Robin H.
Johnson napisał:
> On Thu, Nov 02, 2017 at 08:11:59PM +0100, Michał Górny wrote:
> > Next version. Now without MISC/OPTIONAL, and with many clarifications.
> 
> Huge improvements in this version, I found it much easier to understand.
> 
> Nits: 
> - please stick to ASCII ellipsis. The unicode ellipsis is unreadable in
>   some monospace fonts.

Done. Also replaced '—' for consistency.

> 
> Further items inline:
> > Directory tree coverage
> > -----------------------
> 
> ...
> > The file entries (except for ``IGNORE``) can be specified for regular
> > files only. Symbolic links are followed when opening files
> > and traversing directories. It is an error to specify an entry for
> > a different file type. If the tree contain files of other types
> > that are not otherwise ignored, they need to be covered by an explicit
> > ``IGNORE``.
> > 
> > All the local (non-``DIST``) files covered by a Manifest tree must
> > reside on the same filesystem. It is an error to specify entries
> > applying to files on another filesystem. If subdirectories
> > that are not otherwise ignored reside on a different filesystem, they
> > must be explicitly excluded via ``IGNORE``.
> 
> I would prefer this to say:
> 'If files that are not otherwise ignored reside on a different
> filesystem', as expanded from sub-directories.  
> This implicitly forbids following a symlink that crosses a filesystem
> boundary, and then matches the similar part of 'Tree layout
> restrictions'.

I've went for something even more explicit:

| If files or directories that are not otherwise ignored reside
| on a different filesystem, or symbolic links point to targets
| on a different filesystem, they must be explicitly excluded
| via ``IGNORE``.


> 
> > Rationale
> > =========
> 
> ...
> > Tree layout restrictions
> > ------------------------
> > 
> > The algorithm is meant to work primarily with ebuild repositories which
> > normally contain only files and directories. Directories provide
> > no useful metadata for verification, and specifying special entries
> > for additional file types is purposeless. Therefore, the specification
> > is restricted to dealing with regular files.
> > 
> > The Gentoo repository does not use symbolic links. Some Gentoo
> > repositories do, however. To provide a simple solution for dealing with
> > symlinks without having to take care to implement special handling for
> > them, the common behavior of implicitly resolving them is used.
> > Therefore, symbolic links to files are stored as if they were regular
> > files, and symbolic links to directories are followed as if they were
> > regular directories.
> > 
> > Dotfiles are implicitly ignored as that is a common notion used
> > in software written for POSIX systems. All other common filenames
> > require explicit ``IGNORE`` lines.
> 
> 'common' in the second sentence seems odd. What about uncommon
> filenames? Maybe just s/other common filenames/other filenames/.

Done. The idea was to say 'do not put IGNORE for corner cases which are
better handled via PM config' but I guess it's not necessary here.

> 
> > An ability to inject additional ignore entries is provided to account
> > for site configuration affecting the repository tree — placing
> > additional files in it, skipping some of the categories from syncing.
> 
> Mention that the package manager may provide wildcards or regex in the
> additional entries. Eg: 'IGNORE **/metadata.xml' 

Done.

| This configuration can extend beyond the limits of this GLEP,
| e.g. by allowing wildcards or regular expressions.

> 
> > Non-strict Manifest verification
> > --------------------------------
> 
> ...
> > The cases for stripping unnecessary files mostly focused around space
> > savings. For this purpose, stripping ``metadata.xml`` and similar files
> > has little value. It is much more common for users to strip whole
> > categories which can not be handled via the ``MISC`` type, and needs
> > a dedicated package manager mechanism. The same mechanism can also
> > handle files that used the ``MISC`` type.
> 
> Exclusion by package does happen as well. A list of categories or
> packages can be used for both the rsync exclusion and the IGNORE.

Rewritten to:

| It is much more common for users to strip whole packages
| or categories. The ``MISC`` type is not suitable for that,
| and so a dedicated package manager mechanism needs to be developed
| instead; possibly combining it with rsync exclusion list. The same
| mechanism can also handle files that historically used the ``MISC``
| type.

But it's merely a rationale, so I'd rather not spend another hour trying
to cover every corner case in it.

> 
> > Splitting distfile checksums from file checksums
> > ------------------------------------------------
> > 
> > Another problem with the current Manifest format is that the checksums
> > for fetched files are combined with checksums for local files
> > in a single file inside the package directory. It has been specifically
> > pointed out that:
> > 
> > - since distfiles are sometimes reused across different packages,
> >   the repeating checksums are redundant,
> 
> Comment: 8.4% of all DIST entries are duplicate, representing a 2MiB
> saving in tree size (25MiB of DIST entries altogether).

Included as footnote:

.. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
   at the time of writing are duplicate, representing a 2 MiB
   out of 25 MiB of DIST entries altogether.

> 
> > - mirror admins were interested in the possibility of verifying all
> >   the distfiles with a single tool.
> > 
> > This specification does not provide a clean solution to this problem.
> > It technically permits moving ``DIST`` entries to higher-level Manifests
> > but the usefulness of such a solution is doubtful.
> 
> This solution would require the packager manager to consider
> higher-level Manifests or all Manifests in the tree when searching for
> the DIST entry. The most useful implementation of this would be for the
> git->rsync process to move all DIST entries elsewhere (metadata/ maybe).

Technically speaking, the package manager needs to consider parent
Manifests anyway in order to verify the deeper Manifests, and I think we
can reasonably assume it will keep them cached.

> 
> Either way, this would have many downsides, and make manual work on the
> Manifest DIST entries painful.

That's what 'doubtful usefulness' means ;-P.

-- 
Best regards,
Michał Górny


Reply via email to