la, 2009-11-07 kello 00:14 +0200, Alex Grönholm kirjoitti: [clip: problems in distributing scientific Python packages] > I for one did not understand the problem. What does CPAN have that PyPI > doesn't? > It is natural for packages (distributions, in distutils terms) to have > dependencies on each other. Why is this a problem?
Personally, I've not so much had trouble with PyPi, but with the rest of the toolchain. What's special with scientific software is that - They're usually not pure-Python - Need support for not only for C, but e.g. Fortran compilers - It may be necessary to build them on platforms where libraries etc. are in non-standard places - It may be useful to be able to build them with non-gcc compilers - They may need to ship more data files etc. than plain Python modules - Python is a newcomer on the scientific scene. Not all people want to spend time to spend on installation problems. Not all people are experienced Python users. So it may be more likely that the following things hurt in distributing these Python modules: 1. Incomplete documentation for distutils. For example, where can you find out what `package_data` option of setup() wants as the input? What if you have your package in src/packagename and data files under data/? What are the paths given to it relative to? The Distribute documentation is starting to look quite reasonable -- so documentation is becoming less of a problem. But it seems still to assume that the reader is familiar with distutils. 2. Magic. For example, what decides which files are included by sdist? It appears this depends on (i) what's in the autogenerated *.egg-info/SOURCES.txt (ii) whether you are using SVN and are using setuptools (iii) possible package_data etc. options, (iv) MANIFEST or maybe MANIFEST.in. IMHO, the system is too byzantine in ordinary matters, which increases the number of things you need to learn. 3. Many layers: distutils, setuptools, numpy.distutils. Numpy has its own distutils extensions, primarily for Fortran support. 4. Inflexibility. The toolchain is a bit inflexible: suppose you need to do something "custom" during the build, say, detect sizeof(long double) and add a #define to build options according to it. Finding out how to do this properly again takes time. 5. Distutils, and tools derived from it have bad failure modes. This hurts most with building extension modules. Given the many layers, and the fact that the build is driven by software that few really understand, it's difficult to understand and fix even simple errors encountered. Suppose a build fails, because your C or Fortran compiler gets passed a flag it doesn't like. How do you work around this? Suppose you have a library installed in a non-standard location. How do you tell distutils to look for it in the correct place? (The answer is to use "build_ext" command separately and pass it -L, but this is difficult to find out, as "build" does not accept -L.) The last one is in practice quite annoying, as given the heterogenous environments, it's not easy to make your package buildable on all possible platforms where people might want to use it. When people run into problems, they are stumped by the complexity of distutils. The above concerns only building packages -- perhaps there is more to say also about other parts. Also, I don't really have much experience with CPAN or CRAN, so I can't say how much Python is better or worse off here. -- Pauli Virtanen _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig