(warning, long post) Hi there,
As some of you already know, the packaging and distributions of scientific python packages have been a constant source of frustration. Open source is about making it easy for anyone to use software how they see fit, and I think python packaging infrastructure has not been very successfull for people not intimately familiar with python. A few weeks ago, after Guido visited Berkeley and was told how those issues were still there for the scientific community, he wrote an email asking whether current efforts on distutils-sig will be enough (see http://aspn.activestate.com/ASPN/Mail/Message/distutils-sig/3775972). Several of us have been participating to this discussion, but I feel like the divide between current efforts on distutils-sig and us (the SciPy community) is not getting smaller. At best, their efforts will be more work for us to track the new distribute fork, and more likely, it will be all for nothing as it won't solve any deep issue. To be honest, most of what is considered on distutils-sig sounds like anti-goals to me. Instead of keeping up with the frustrating process of "improving" distutils, I think we have enough smart people and manpower in the scientific community to go with our own solution. I am convinced it is doable because R or haskell, with a much smaller community than python, managed to pull out something with is miles ahead compared to pypi. The SciPy community is hopefully big enough so that a SciPy-specific solution may reach critical mass. Ideally, I wish we had something with the following capabilities: - easy to understand tools - http-based package repository ala CRAN, which would be easy to mirror and backup (through rsync-like tools) - decoupling the building, packaging and distribution of code and data - reliable install/uninstall/query of what is installed locally - facilities for building windows/max os x binaries - making the life of OS vendors (Linux, *BSD, etc...) easier The packaging part ============== Speaking is easy, so I started coding part of this toolset, called toydist (temporary name), which I presented at Scipy India a few days ago: http://github.com/cournape/toydist/ Toydist is more or less a rip off of cabal (http://www.haskell.org/cabal/), and consist of three parts: - a core which builds a package description from a declarative file similar to cabal files. The file is almost purely declarative, and can be parsed so that no arbitrary code is executed, thus making it easy to sandbox packages builds (e.g. on a build farm). - a set of command line tools to configure, build, install, build installers (egg only for now) etc... from the declarative file - backward compatibility tools: a tool to convert existing setup.py to the new format has been written, and a tool to use distutils through the new format for backward compatibility with complex distutils extensions should be relatively easy. The core idea is to make the format just rich enough to describe most packages out there, but simple enough so interfacing it with external tools is possible and reliable. As a regular contributor to scons, I am all too aware that a build tool is a very complex beast to get right, and repeating their efforts does not make sense. Typically, I envision that complex packages such as numpy, scipy or matplotlib would use make/waf/scons for the build - in a sense, toydist is written so that writing something like numscons would be easier. OTOH, most if not all scikits should be buildable from a purely declarative file. To give you a feel of the format, here is a snippet for the grin package from Robert K. (automatically converted): Name: grin Version: 1.1.1 Summary: A grep program configured the way I like it. Description: ==== grin ==== I wrote grin to help me search directories full of source code. The venerable GNU grep_ and find_ are great tools, but they fall just a little short for my normal use cases. <snip> License: BSD Platforms: UNKNOWN Classifiers: License :: OSI Approved :: BSD License, Development Status :: 5 - Production/Stable, Environment :: Console, Intended Audience :: Developers, Operating System :: OS Independent, Programming Language :: Python, Topic :: Utilities, ExtraSourceFiles: README.txt, setup.cfg, setup.py, Library: InstallDepends: argparse, Modules: grin, Executable: grin module: grin function: grin_main Executable: grind module: grin function: grind_main Although still very much experimental at this point, toydist already makes some things much easier than with distutils/setuptools: - path customization for any target can be done easily: you can easily add an option in the file so that configure --mynewdir=value works and is accessible at every step. - making packages FHS compliant is not a PITA anymore, and the scheme can be adapted to any OS, be it traditional FHS-like unix, mac os x, windows, etc... - All the options are accessible at every step (no more distutils commands nonsense) - data files can finally be handled correctly and consistently, instead of the 5 or 6 magics methods currently available in distutils/setuptools/numpy.distutils - building eggs does not involve setuptools anymore - not much coupling between package description and build infrastructure (building extensions is actually done through distutils ATM). Repository ======== The goal here is to have something like CRAN (http://cran.r-project.org/web/views/), ideally with a build farm so that whenever anyone submits a package to our repository, it would automatically be checked, and built for windows/mac os x and maybe a few major linux distributions. One could investigate the build service from open suse to that end (http://en.opensuse.org/Build_Service), which is based on xen VM to build installers in a reproducible way. Installed package db =============== I believe that the current open source enstaller package from Enthought can be a good starting point. It is based on eggs, but eggs are only used as a distribution format (eggs are never installed as eggs AFAIK). You can easily remove packages, query installed versions, etc... Since toydist produces eggs, interoperation between toydist and enstaller should not be too difficult. What's next ? ========== At this point, I would like to ask for help and comments, in particular: - Does all this make sense, or hopelessly intractable ? - Besides the points I have mentioned, what else do you think is needed ? - There has already been some work for the scikits webportal, but I think we should bypass pypi entirely (the current philosophy of not enforcing consistent metadata does not make much sense to me, and is at the opposite of most other similar system out there). - I think a build farm for at least windows packages would be a killer feature, and enough incentive to push some people to use our new infrastructure. It would be good to have a windows guy familiar with windows sandboxing/virtualization to do something there. The people working on the opensuse build service have started working on windows support - I think being able to automatically convert most of scientific packages is a significant feature, and needs to be more robust - so anyone is welcomed to try converting existing setup.py with toydist (see toydist readme). thanks, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion