Excerpts from Thomas Kluyver's message of 2017-10-18 15:52:00 +0100: > We're increasingly using entry points in Jupyter to help integrate > third-party components. This brings up a couple of things that I'd like > to do: > > 1. Specification > > As far as I know, there's no document describing the details of entry > points; it's a de-facto standard established by setuptools. It seems to > work quite well, but it's worth writing down what is unofficially > standardised. I would like to see a document on > https://packaging.python.org/specifications/ saying: > > - Where build tools should put entry points in wheels > - Where entry points live in installed distributions > - The file format (including allowed characters, case sensitivity...) > > I guess I'm volunteering to write this, although if someone else wants > to, don't let me stop you. ;-) > > I'd also be happy to hear that I'm wrong, that this specification > already exists somewhere. If it does, can we add a link from > https://packaging.python.org/specifications/ ?
I've always used the setuptools documentation as a reference. Are you suggesting moving that information to a different location to allow/encourage other tools to implement it as a standard? > 2. Caching > > "There are only two hard problems in computer science: cache > invalidation, naming things, and off-by-one errors" > > I know that caching is going to make things more complex, but at present > a scan of available entry points requires a stat() for every installed > package, plus open()+read()+parse for every installed package that > provides entry points. This doesn't scale well, especially on spinning > hard drives. By eliminating a call to pygments which caused an entry > points scan, we cut the cold-start time of IPython almost in half on one > HDD system (11s -> 6s; PR 10859). > > As packaging improves, the trend is to break functionality into more, > smaller packages, which is only going to make this worse (though I hope > we never end up with a left-pad package ;-). Caching could allow entry > points to be used in places where the current performance penalty is too > much. > > I envisage a cache working something like this: > - Each directory on sys.path can have a cache file, e.g. > 'entry-points.json' > - I suggest JSON because Python can parse it efficiently, and it's not > intended to be directly edited by humans. Other options? SQLite? Does > someone want to do performance comparisons? > - There is a command to scan all packages in a directory and build the > cache file > - After an install tool (e.g. pip) has added/removed packages from a > directory, it should call that command to rebuild the cache. > - A second command goes through all directories on sys.path and rebuilds > their cache files - this lets the user rebuild caches if something has > gone wrong. > - Applications looking for entry points can choose from a range of > behaviours depending on how important accuracy and performance are. E.g. > ignore all caches, only use caches, use caches for directories where > they exist, or try caches first and then scan packages if a key is > missing. > > In the best case, when the caches exist and you trust them, loading them > would cost one set of filesystem operations per sys.path entry, rather > than per package. > > Thanks, > Thomas We've run into similar issues in some applications I work on. I had intended to implement a caching layer within stevedore (https://docs.openstack.org/stevedore/latest/) as a first step for experimenting with approaches, but I would be happy to collaborate on something further upstream if there's interest. Doug _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig