On 20 October 2017 at 02:14, Thomas Kluyver <tho...@kluyver.me.uk> wrote:

> On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote:
> > I’m in favor, although one question I guess is whether it should be a a
> > PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since
> > without (2), its just another file in the .dist-info directory and that
> > doesn’t actually need standardized at all). I don’t think that this will
> > be a very controversial PEP though, and should be pretty easy.
>
> I have opened a PR to document what is already there, without adding any
> new features. I think this is worth doing even if we don't change
> anything, since it's a de-facto standard used for different tools to
> interact.
>
> https://github.com/pypa/python-packaging-user-guide/pull/390
>
> We can still write a PEP for caching if necessary.
>

+1 for that approach (PR for the status quo, PEP for a shared metadata
caching design) from me

Making the status quo more discoverable is valuable in its own right, and
the only decisions we'll need to make for that are terminology
clarification ones, not interoperability ones (this isn't like PEP 440 or
508 where we actually thought some of the default setuptools behaviour was
slightly incorrect and wanted to change it).

Figuring out a robust cross-platform network-file-system-tolerant metadata
caching design on the other hand is going to be hard, and as Donald
suggests, the right ecosystem level solution might be to define
install-time hooks for package installation operations.


> > I’m also in favor of this. Although I would suggest SQLite rather than a
> > JSON file for the primary reason being that a JSON file isn’t
> > multiprocess safe without being careful (and possibly introducing
> > locking) whereas SQLite has already solved that problem.
>
> SQLite was actually my first thought, but from experience in Jupyter &
> IPython I'm wary of it - its built-in locking does not work well over
> NFS, and it's easy to corrupt the database. I think careful use of
> atomic writing can be more reliable (though that has given us some
> problems too).
>
> That may be easier if there's one cache per user, though - we can
> perhaps try to store it somewhere that's not NFS.
>

I'm wondering if rather than jumping straight to a PEP, it may make sense
to instead initially pursue this idea as a *non-*standard, implementation
dependent thing specific to the "entrypoints" project. There are a *lot* of
challenges to be taken into account for a truly universal metadata caching
design, and it would be easy to fall into the trap of coming up with a
design so complex that nobody can realistically implement it.

Specifically, I'm thinking of a usage model along the lines of the
updatedb/locate pair on *nix systems: `locate` gives you access to very
fast searches of your filesystem, but it *doesn't* try to automagically
keeps its indexes up to date. Instead, refreshing the indexes is handled by
`updatedb`, and you can either rely on that being run automatically in a
cron job, or else force an update with `sudo updatedb` when you want to use
`locate`.

For a project like entrypoints, what that might look like is that at
*runtime*, you may implement a reasonably fast "cache freshness check",
where you scanned the mtime of all the sys.path entries, and compared those
to the mtime of the cache. If the cache looks up to date, then cool,
otherwise emit a warning about the stale metadata cache, and then bypass it.

The entrypoints project itself could then expose a
`refresh-entrypoints-cache` command that could start out only supporting
virtual environments, and then extend to per-user caching, and then finally
(maybe) consider whether or not it wanted to support installation-wide
caches (with the extra permissions management and cross-process and
cross-system coordination that may imply).

Such an approach would also tie in nicely with Donald's suggestion of
reframing the ecosystem level question as "How should the entrypoints
project request that 'refresh-entrypoints-cache' be run after every package
installation or removal operation?", which in turn would integrate nicely
with things like RPM file triggers (where the system `pip` package could
set a file trigger that arranged for any properly registered Python package
installation plugins to be run for every modification to site-packages
while still appropriately managing the risk of running arbitrary code with
elevated privileges)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to