Hi all,
I think there's a lot to gain for Python by improving PyPI, and I'm willing to help. I did help a bit with PyPI at last year's EuroPython sprint, and was then made aware of http://wiki.python.org/ moin/CheeseShopDev - is this the most up-to-date plans for PyPI? If you're in a hurry and don't want to read everything; 1) I've created a little app to help prototype how we can do better egg/package management at http://contrib.exoweb.net/trac/browser/egg/ 2) I'd like feedback, and pointers to how I can help more. Basically, the problems I would like to work on solving are: 1) Simplifying/enabling discovery of packages 2) Simplifying/enabling management of packages 3) Improving quality and usefulness of package index From a usability point-of view I'd like to focus on the requirements for the Python newbie, someone that has just discovered Python, but is probably used to package management systems from Linux distributions, FreeBSD, and other dynamic languages like Perl and Ruby (these are also the systems I have experience with, so I'm pulling ideas from them). Ideally everything should be (following Steve Krug's "Don't Make Me Think" recommendations) self-evident, and if that's not possible, at least self-explanatory. Someone put in front of a keyboard without having read any docs should be able to find, install, manage, and perhaps even create Python packages. Better usability will of course benefit everyone, not just beginners. I'm frankly amazed at how people that have programmed Python for years don't really know or use PyPI. I'm convinced making more of Python package system discoverable and easily accessible will greatly improve the adoption of Python, the number of Python packages, and the quality of these packages. I think the typical use cases would be (in order of importance, based on what a typical user would encounter first): * Find available eggs for a particular topic online * Get more information about an egg * Install an egg (and its dependencies) * See which eggs are installed * Upgrade some or all outdated eggs * Remove/uninstall an egg * Create an egg * Find eggs that are plugins for some framework online NAMING So, first of all we'll need either one command, or a set of similarly named commands, to do discovery, installation, and management of packages, as these are common end-user actions. Creation of packages is a bit more advanced, and could be in another command. If there's general agreement that Python eggs is the future way of distributing packages, why not call the command "egg", similar to the way many other package managers are named after the packages, e.g., rpm, port, gem? I'll assume that's the case. Next, where do you find eggs? This might not be a big issue if the "egg" command is configured properly by default, but I'd offer my thoughts. I know the cheeseshop just changed name back to PyPI again. In my opinion, neither of the names are good in that they don't help people remember; any Monty Python connection is lost on the big masses, and PyPI is hard to spell, not very obvious, and a confusing clash with the also-prominent PyPy project. Why not call the place for eggs just eggs? I.e., http://eggs.python.org/ So we'd have the command "egg" for managing eggs that are by default found at "eggs.python.org". I think it's hard to make Python package management more obvious that this. The goal is to get someone that is new to Python to remember how to get and where to find packages, so obvious is a good thing. THE COMMAND LINE PACKAGE MANAGEMENT TOOL The "egg" command should enable you to at least find, show info for, install, and uninstall packages. I think the most common way to do command line tools like this is to offer sub-commands, a la, bzr, port, svn, apt-get, gem, so I suggest: egg - list out a help of commands egg search - search for eggs (aliases: find/list) egg info - show info for egg (aliases: show/details) egg install - install named eggs egg uninstall - uninstall eggs (aliases: remove/purge/delete) so you can do: egg search bittorrent to find all packages that have anything to do with bittorrent (full- text search of the package index), and then: egg install iTorrent to actually download and install the package. PROTOTYPE I've built a command that works this way, implementing most (except the last) of the use cases at least partiall. You can give it a go as follows: # install prerequisities on your platform # e.g., sudo apt-get install python-setuptools sqlite3 libsqlite3-0 python-pysqlite2 svn co http://contrib.exoweb.net/svn/egg/ cd egg sudo python setup.py develop # should install storm for you gzip -dc pypi.sql.gz | sqlite3 ~/.pythoneggs.db # bootstrap cache egg sync # update cache It's still incomplete, lacking tests, might only work on unix-y computers, and is lacking support for lots of features like activation/deactivation, and upgrades, but it works for basic stuff like finding, installing, and uninstalling packages. Summary of the design: * Local and PyPI package information is synchronized into a local sqlite database for easy access * Storm is used for ORM (but could easily be changed) * Installation is handled by passing off the "egg install" command to "easy_install" * I'm using a non-standard command-line parser (but could easily be changed) * For interactive use on terminals that supports it: colorizes and adjusts text to fit While doing the synchronization with PyPI I discovered a couple of issues, described below, that makes the application unfit for common use yet. (Eg., it has to query the PyPI for each of the packages.) Most subcommands take arguments that can be a free mix of set names and query strings. I thought this would make for the most forgiving and user-friendly interface. These are filters; by default all eggs match. SETS: Eggs have a few attributes that can be used to limit to a subset of all eggs, e.g., whether it is installed, active, oudated, local, or remote. Specifying several of these creates a join of the sets, it further limits the number of eggs. QUERY STRINGS: If none of the set names are matched, the argument is assumed to be a query string. Many subcommands like "search" do a full-text search of the package cache database. Others, like "list", will do a substring match of package names. Others, like "install" will require you to match the name exactly. You can specify a specific version by adding a slash, e.g., "name/version". Here are some example commands: egg list installed sql - list all installed eggs having sql in their name egg search installed sql - list all installed eggs mentioning sql anywhere in the package metadata egg list oudated installed - list all outdated installed eggs egg list oudated active - list all outdated and active (and installed) eggs egg uninstall outdated - uninstall all oudated eggs egg info pysqlite - show information about pysqlite egg info pysqlite/2.0.0 - show information about version 2.0.0 of pysqlite egg sync local - rescan local packages and update cache db PYPI IMPROVEMENT SUGGESTIONS While doing the application I discovered one important missing feature: PyPI doesn't offer a way to programatically bulk-download information about all eggs, as is customary for many other packaging systems. This means "egg sync" will have to fetch the information for each package individually. I think it wouldn't be hard to offer a compressed XML file with all of the package information, suitable for download. A minor nuiscence is that there's no way to get only eggs/ distributions; PyPI lists packages, and some packages don't even have any eggs. The "egg" command will try to download each of these empty packages at each sync (since it treats empty packages as "packages for which we haven't downloaded eggs for yet"). It might be better to list eggs/distributions instead of packages. There's a lot of opportunity in improving the consistency and usefulness of package metainformation. Once you have it all sync'ed to a local SQlite database and start snooping around, it'll be pretty obvious; very few packages use the dependencies etc. (In fact, I think the dependencies/obsoletes definitions are overengineered; we could get by with just a simple package >= version number). Many people use other platform-specific packaging system to manage Python packages, probably both because this gives dependencies to other non-Python packages, but also because PyPI hasn't been very useful or easy to use. It may even be asked what the role of PyPI is since it's never going to replace platform-specific packaging systems; then should it support them? How? In any case, installing Python packages from different packaging systems would result in problems, and currently "egg" can't find Python packages installed using other systems. ("Yolk" has some support for discovering Python packages installed using Gentoo.) Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing steam to REST, so I think we'd gain a lot of "hackability" by enabling a REST interface for accessing packages. Eventually we probably need to enforce package signing. EGG IDEAS It'd be good for "egg" to support both system- and user-wide configurations, and to support downloading from several package indexes, like apt-get does. Perhaps "egg" should keep the uninstalled packages in a cache, like apt-get and I believe buildout. Perhaps "egg" should provide a simple web server to allow browsing (and perhaps installation from) local packages (I believe the Ruby guys have this). If this web server should be discoverable via Bonjour/Zeroconf, then all that's needed to set up a cache of PyPI is to run an egg server (that people on the net auto-discovers) and regularly download all packages. How could "egg" work with "buildout"? Should buildout be used for project-specific egg installations? Rgds, Bjorn _______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig