Dear Martin,
Martin Luessi <[email protected]> writes:
> First, let me explain the reason for why anyone would want to do so.
> For work, I use Python extensively for scientific computing. However,
> I do not have administrator rights on my workstation and the
> distribution we use (CentOS) does not have the latest Python packages
> that are needed for scientific computing. In addition, even if CentOS
> had the packages, it wouldn't be feasible to constantly ask the
> sysadmins to install/update packages. One solution is to use a
> scientific Python distribution from a commercial vendor, e.g., Canopy
> from Enthought or Anaconda from Continuum Analytics. While these
> distributions work quite well, they are expensive for non-academic
> users and they are not very flexible, i.e., it can be difficult to
> install packages that are not in the package repository provided by
> the vendor, especially if the packages need additional dependencies. I
> also have a gentoo-prefix setup on my workstation.
Me too, I use Gentoo Prefix for Python-centered scientific computing on
the cluster of my institute.
> However, the whole prefix directory is very large as it makes minimal
> assumptions about the libraries provided by the host system. The size
> is a problem when using it over NFS e.g. on a cluster. Also, I have
> found that it is difficult to get X11 applications working as the
> gentoo-prefix will install its own X server etc.
>
> This made me wonder whether portage could be used to build a
> scientific Python installation. My idea is instead of making very
> minimal assumptions about the libraries provided by the host system
> (as done in a normal prefix install), one could generate a world file
> listing all the libraries provided by the host system and freeze their
> versions using package-mask. Like that, programs and libraries in the
> prefix would link to libraries on the host system whenever possible,
> which would make the prefix smaller. By having a gentoo based
> scientific Python installation, one could take advantage of all the
> packages provided by gentoo-science and it would make it easy to
> install Python packages that depend on non-Python libraries.
> What do you guys think, is this feasible?
Let me try to argue against it.
1. The disk space is extremely cheap now, $1/GB. Prefix will occupy at
most 5GB, with an average of 2GB and minimal of less than 1GB.
1a. NFS is not cool to throw the build directory onto.
What I do is to set PORTAGE_TMPDIR="/dev/shm" or whatever
tmpfs. Then you can achieve a modest speed of building.
2. We are actually doing the other way round: Isolate from the host
libraries as much as possible. We have even reached a (experimental)
stage where only the kernel of the host is used[a].
Why? Because trying to be compatible with a large range of versions
of libraries is not possible. Even the kernel version could break
something[b], and even the present Prefix get broken by some
unexpectedly behaved host libraries. Redhat build their product on
ancient software for a reason: stability.
My thought is to ignore the space Prefix occupies and focus on the
features, stability/maintainability instead.
Benda
a.
http://blogs.gentoo.org/news/2013/11/01/gentoo-monthly-newsletter-31-october-2013/#RAP
b. https://bugs.gentoo.org/show_bug.cgi?id=493074