Hi Mike,

On 22/04/2020 02:32, Mike Kelsey wrote:
Hello, again.  I am working on creating a module for one of my experiment's
internal Python packages.  Normally, we check the package out of our GitBlit
repository and install it using |pip install --user|.  I wrote a .eb file
using 'PythonPackage', and specified the appropriate git_config options to
check everything out.  But the build fails during the final sanity check,
because it doesn't find the dependencies:

== FAILED: Installation ended unsuccessfully (build directory: 
/scratch/group/mitchcomp/eb/tmp/build/CDMSDataCatalog/0.9.2/system-system-Python-3.6.6): 
build failed (first 300 chars): cmd "pip check" exited with exit code 1 and 
output:
cdmsdatacatalog 0.9.2 requires datacat, which is not installed.
cdmsdatacatalog 0.9.2 requires tqdm, which is not installed.

This package contains the required setup.py, which itself has the argument,

       install_requires=['datacat @ 
git+https://github.com/slaclab/datacat.git#subdirectory=client/python',
                         'requests',
                         'tqdm'],

I thought the |pip install| action took care of parsing this list, doing all
the requisite downloads internally, and leaving you with a fully functional
package with all of its dependencies satisfied.

Do we have to copy and adapt that list into EasyBuild language?  If so, how
does one specify the Python-specific suffixes on the Git URL?  And how does
one then propagate all of that to PIP so it knows what to find?  The
exts_list structure does not support this (in particular, it doesn't support
git_config, which I'm working on myself).

We instruct pip to not auto-download-and-install missing dependencies (which it does by default), because that often leads to installations you can't reproduce later using the same easyconfig file.

The "pip check" is something we require in easyconfigs in our central repository fairly recently, and we do so because we have learned the hard way that stuff breaks if we don't ensure all required Python packages are specified in the easyconfig file (see https://github.com/easybuilders/easybuild-easyconfigs/issues/10462 for a recent case of broken easyconfigs for exactly this reason).

For https://github.com/slaclab/datacat.git, you could either use git_config (thanks a lot for your work on making that supported for extensions as well!), or you could just download a source tarball provided by GitHub (https://github.com/slaclab/datacat/archive/stable.tar.gz for example), and use that. The 'subdirectory' part probably corresponds to the 'start_dir' easyconfig parameter?

pip will check for already installed Python packages via $PYTHONPATH, so you don't need to do anything special for "pip check" to be happy, other than listing all required Python packages (that are not already included with the Python installation you're using, or provided by another dependency like SciPy-bundle) with a specific version in the easyconfig file using PythonBundle.

There's a script floating out there (that we should integrate in EasyBuild itself) that facilitates this a bit, see https://gist.github.com/boegel/fd9a636d652aa5c8e57778088e9c0a21 (and improved version in https://gist.github.com/Flamefire/49426e502cd8983757bd01a08a10ae0d).

This reminds me that I should get back to writing up documentation on writing easyconfig files for (bundles of) Python packages...


regards,

Kenneth


                                                -- Mike Kelsey

Reply via email to