hello again, On Tue, Dec 29, 2009 at 2:22 PM, David Cournapeau <courn...@gmail.com> wrote: > On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield <ren...@gmail.com> wrote: >> Hi, >> >> In the toydist proposal/release notes, I would address 'what does >> toydist do better' more explicitly. >> >> >> >> **** A big problem for science users is that numpy does not work with >> pypi + (easy_install, buildout or pip) and python 2.6. **** >> >> >> >> Working with the rest of the python community as much as possible is >> likely a good goal. > > Yes, but it is hopeless. Most of what is being discussed on > distutils-sig is useless for us, and what matters is ignored at best. > I think most people on distutils-sig are misguided, and I don't think > the community is representative of people concerned with packaging > anyway - most of the participants seem to be around web development, > and are mostly dismissive of other's concerns (OS packagers, etc...). >
Sitting down with Tarek(who is one of the current distutils maintainers) in Berlin we had a little discussion about packaging over pizza and beer... and he was quite mindful of OS packagers problems and issues. He was also interested to hear about game developers issues with packaging (which are different again to scientific users... but similar in many ways). However these systems were developed by the zope/plone/web crowd, so they are naturally going to be thinking a lot about zope/plone/web issues. Debian, and ubuntu packages for them are mostly useless because of the age. Waiting a couple of years for your package to be released is just not an option (waiting even an hour for bug fixes is sometimes not an option). Also isolation of packages is needed for machines that have 100s of different applications running, written by different people, each with dozens of packages used by each application. Tools like checkinstall and stdeb ( http://pypi.python.org/pypi/stdeb/ ) can help with older style packaging systems like deb/rpm. I think perhaps if toydist included something like stdeb as not an extension to distutils, but a standalone tool (like toydist) there would be less problems with it. One thing the various zope related communities do is make sure all the relevant and needed packages are built/tested by their compile farms. This makes pypi work for them a lot better than a non-coordinated effort does. There are also lots of people trying out new versions all of the time. > I want to note that I am not starting this out of thin air - I know > most of distutils code very well, I have been the mostly sole > maintainer of numpy.distutils for 2 years now. I have written > extensive distutils extensions, in particular numscons which is able > to fully build numpy, scipy and matplotlib on every platform that > matters. > > Simply put, distutils code is horrible (this is an objective fact) and > flawed beyond repair (this is more controversial). IMHO, it has > almost no useful feature, except being standard. > yes, I have also battled with distutils over the years. However it is simpler than autotools (for me... maybe distutils has perverted my fragile mind), and works on more platforms for python than any other current system. It is much worse for C/C++ modules though. It needs dependency, and configuration tools for it to work better (like what many C/C++ projects hack into distutils themselves). Monkey patching, and extensions are especially a problem... as is the horrible code quality of distutils by modern standards. However distutils has had more tests and testing systems added, so that refactoring/cleaning up of distutils can happen more so. > If you want a more detailed explanation of why I think distutils and > all tools on top are deeply flawed, you can look here: > > http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations-cabal-for-a-solution/ > I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil! Leave my python site-packages directory alone I say... especially don't let setuptools infect it :) Many people currently find the multi versions of packages in isolation approach works well for them - so for some use cases the tools are working wonderfully. >> numpy used to work with buildout in python2.5, but not with 2.6. >> buildout lets other team members get up to speed with a project by >> running one command. It installs things in the local directory, not >> system wide. So you can have different dependencies per project. > > I don't think it is a very useful feature, honestly. It seems to me > that they created a huge infrastructure to split packages into tiny > pieces, and then try to get them back together, imaganing that > multiple installed versions is a replacement for backward > compatibility. Anyone with extensive packaging experience knows that's > a deeply flawed model in general. > Science is supposed to allow repeatability. Without the same versions of packages, repeating experiments is harder. This is a big problem in science that multiple versions of packages in _isolation_ can help get to a solution to the repeatability problem. Just pick some random paper and try to reproduce their results. It's generally very hard, unless the software is quite well packaged. Especially for graphics related papers, there are often many different types of environments, so setting up the environments to try out their techniques, and verify results quickly is difficult. Multiple versions are not a replacement for backwards compatibility, just a way to avoid the problem in the short term to avoid being blocked. If a new package version breaks your app, then you can either pin it to an old version, fix your app, or fix the package. It is also not a replacement for building on stable high quality components, but helps you work with less stable, and less high quality components - at a much faster rate of change, with a much larger dependency list. >> Plenty of good work is going on with python packaging. > > That's the opposite of my experience. What I care about is: > - tools which are hackable and easily extensible > - robust install/uninstall > - real, DAG-based build system > - explicit and repeatability > > None of this is supported by the tools, and the current directions go > even further away. When I have to explain at length why the > command-based design of distutils is a nightmare to work with, I don't > feel very confident that the current maintainers are aware of the > issues, for example. It shows that they never had to extend distutils > much. > All agreed! I'd add to the list parallel builds/tests (make -j 16), and outputting to native build systems. eg, xcode, msvc projects, and makefiles. It would interesting to know your thoughts on buildout recipes ( see creating recipes http://www.buildout.org/docs/recipe.html ). They seem to work better from my perspective. However, that is probably because of isolation. The recipe are only used by those projects that require them. So the chance of them interacting are lower, as they are not installed in the main python. How will you handle toydist extensions so that multiple extensions do not have problems with each other? I don't think this is possible without isolation, and even then it's still a problem. Note, the section in the distutils docs on creating command extensions is only around three paragraphs. There is also no central place to go looking for extra commands (that I know of). Or a place to document or share each others command extensions. Many of the methods for extending distutils are not very well documented either. For example, 'how do I you change compiler command line arguments for certain source files?' Basic things like that are possible with disutils, but not documented (very well). >> >> There are build farms for windows packages and OSX uploaded to pypi. >> Start uploading pre releases to pypi, and you get these for free (once >> you make numpy compile out of the box on those compile farms). There >> are compile farms for other OSes too... like ubuntu/debian, macports >> etc. Some distributions even automatically download, compile and >> package new releases once they spot a new file on your ftp/web site. > > I am familiar with some of those systems (PPA and opensuse build > service in particular). One of the goal of my proposal is to make it > easier to interoperate with those tools. > yeah, cool. > I think Pypi is mostly useless. The lack of enforced metadata is a big > no-no IMHO. The fact that Pypi is miles beyond CRAN for example is > quite significant. I want CRAN for scientific python, and I don't see > Pypi becoming it in the near future. > > The point of having our own Pypi-like server is that we could do the > following: > - enforcing metadata > - making it easy to extend the service to support our needs > Yeah, cool. Many other projects have their own servers too. pygame.org, plone, etc etc, which meet their own needs. Patches are accepted for pypi btw. What type of enforcements of meta data, and how would they help? I imagine this could be done in a number of ways to pypi. - a distutils command extension that people could use. - change pypi source code. - check the metadata for certain packages, then email their authors telling them about issues. >> >> pypm: http://pypm.activestate.com/list-n.html#numpy > > It is interesting to note that one of the maintainer of pypm has > recently quitted the discussion about Pypi, most likely out of > frustration from the other participants. > yeah, big mailing list discussions hardly ever help I think :) oops, this is turning into one. >> Documentation projects are being worked on to document, give tutorials >> and make python packaging be easier all round. As witnessed by 20 or >> so releases on pypi every day(and growing), lots of people are using >> the python packaging tools successfully. > > This does not mean much IMO. Uploading on Pypi is almost required to > use virtualenv, buildout, etc.. An interesting metric is not how many > packages are uploaded, but how much it is used outside developers. > Yeah, it only means that there are lots of developers able to use the packaging system to put their own packages up there. However there are over 500 science related packages on there now - which is pretty cool. A way to measure packages being used would be by downloads, and by which packages depend on which other packages. I think the science ones would be reused lower than normal, since a much higher percentage are C/C++ based, and are likely to be more fragile packages. >> >> I'm not sure making a separate build tool is a good idea. I think >> going with the rest of the python community, and improving the tools >> there is a better idea. > > It has been tried, and IMHO has been proved to have failed. You can > look at the recent discussion (the one started by Guido in > particular). > I don't think 500+ science related packages is a total failure really. >> pps. some notes on toydist itself. >> - toydist convert is cool for people converting a setup.py . This >> means that most people can try out toydist right away. but what does >> it gain these people who convert their setup.py files? > > Not much ATM, except that it is easier to write a toysetup.info > compared to setup.py IMO, and that it supports a simple way to include > data files (something which is currently *impossible* to do without > writing your own distutils extensions). It has also the ability to > build eggs without using setuptools (I consider not using setuptools a > feature, given the too many failure modes of this package). > yeah, I always make setuptools not used in my packages by default. However I use command line arguments to use the features of setuptools required (eggs, bdist_mpkg etc etc). Having a tool to create eggs without setuptools would be great in itself. Definitely list this in the feature list :) > The main goals though are to make it easier to build your own tools on > top of if, and to integrate with real build systems. > yeah, cool. >> - a toydist convert that generates a setup.py file might be cool :) > > toydist started like this, actually: you would write a setup.py file > which loads the package from toysetup.info, and can be converted to a > dict argument to distutils.core.setup. I have not updated it recently, > but that's definitely on the TODO list for a first alpha, as it would > enable people to benefit from the format, with 100 % backward > compatibility with distutils. > yeah, cool. That would let you develop things incrementally too, and still have toydist be useful for the whole development period until it catches up with the features of distutils needed. >> - arbitrary code execution happens when building or testing with >> toydist. > > You are right for testing, but wrong for building. As long as the > build is entirely driven by toysetup.info, you only have to trust > toydist (which is not safe ATM, but that's an implementation detail), > and your build tools of course. > If you execute build tools on arbitrary code, then arbitrary code execution is easy for someone who wants to do bad things. Trust and secondarily sandboxing are the best ways to solve these problems imho. > Obviously, if you have a package which uses an external build tool on > top of toysetup.info (as will be required for numpy itself for > example), all bets are off. But I think that's a tiny fraction of the > interesting packages for scientific computing. > yeah, currently 1/5th of science packages use C/C++/fortran/cython etc (see http://pypi.python.org/pypi?:action=browse&c=40 110/458 on that page ). There seems to be a lot more using C/C++ compared to other types of pakages on there (eg zope3 packages list 0 out of 900 packages using C/C++). So the hight number of C/C++ science related packages on pypi demonstrate that better C/C++ tools for scientific packages is a big need. Especially getting compile/testing farms for all these packages. Getting compile farms is a big need compared to python packages - since C/C++ is MUCH harder to write/test in a portable way. I would say it is close to impossible to get code to work without quite good knowledge on multiple platforms without errors. There are many times with pygame development that I make changes on an osx, windows or linux box, commit the change, then wait for the compile/tests to run on the build farm ( http://thorbrian.com/pygame/builds.php ). Releasing packages otherwise makes the process *heaps* longer... and many times I still get errors on different platforms, despite many years of multi platform coding. > Sandboxing is particularly an issue on windows - I don't know a good > solution for windows sandboxing, outside of full vms, which are > heavy-weights. > yeah, VMs are the way to go. If only to make the copies a fresh install each time. However I think automated distributed building, and trust are more useful. ie, only build those packages where you trust the authors, and let anyone download, build and then post their build/test results. MS have given out copies of windows to some people to set up VMs for building to different members of the python community in the past. By automated distributed building, I mean what happens with mailing lists usually. Where people post their test results when they have a problem. Except in a more automated manner. Adding a 'Do you want to upload your build/test results?' at the end of a setup.py for subversion builds would give you dozens or hundreds of test results daily from all sorts of machines. Making it easy for people to set up package builders which also upload their packages somewhere gives you distributed package building, in a fairly safe automated manner. (more details here: http://renesd.blogspot.com/2009/09/python-build-bots-down-maybe-they-need.html ) >> - it should be possible to build this toydist functionality as a >> distutils/distribute/buildout extension. > > No, it cannot, at least as far as distutils/distribute are concerned > (I know nothing about buildout). Extending distutils is horrible, and > fragile in general. Even autotools with its mix of generated sh > scripts through m4 and perl is a breeze compared to distutils. > >> - extending toydist? How are extensions made? there are 175 buildout >> packages which extend buildout, and many that extend >> distutils/setuptools - so extension of build tools in a necessary >> thing. > > See my answer earlier about interoperation with build tools. > I'm still not clear on how toydist will be extended. I am however, a lot clearer about its goals. cheers, _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion