Drew Parsons a écrit le 06/04/2020 à 05:08 :
> On 2020-04-06 09:56, Drew Parsons wrote:
>> On 2020-04-06 01:48, Gilles Filippini wrote:
>>> Drew Parsons a écrit le 05/04/2020 à 18:57 :
>>>>
>>>> Another option is to create an environment variable to force h5py to
>>>> load the mpi version even when run in a serial environment without
>>>> mpirun. Easy enough to set up, though I'm interested to see if "mpirun
>>>> -n 1 dh_auto_build" or a variation of that is viable.  Maybe
>>>> %:
>>>>     mpirun -n 1 dh $@ --with python3 --buildsystem=pybuild
>>>
>>> This, way the test cases run against python3.7 is OK, but it fails
>>> against python3.8 with:
>>>
>>> I: pybuild base:217: cd
>>> /build/bitshuffle-z2ZvpN/bitshuffle-0.3.5/.pybuild/cpython3_3.8_bitshuffle/build;
>>>
>>> python3.8 -m unittest discover -v
>>> [pinibrem15:43725] OPAL ERROR: Unreachable in file ext3x_client.c at
>>> line 112
>>> *** An error occurred in MPI_Init_thread
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***    and potentially your MPI job)
>>> [pinibrem15:43725] Local abort before MPI_INIT completed completed
>>> successfully, but am not able to aggregate error messages, and not able
>>> to guarantee that all other processes were killed!
>>> E: pybuild pybuild:352: test: plugin distutils failed with: exit code=1:
>>> cd
>>> /build/bitshuffle-z2ZvpN/bitshuffle-0.3.5/.pybuild/cpython3_3.8_bitshuffle/build;
>>>
>>> python3.8 -m unittest discover -v
>>> dh_auto_test: error: pybuild --test -i python{version} -p "3.7 3.8"
>>> returned exit code 13
>>>
>>> But the HDF5 error is no more present with python3.7. So it seems a good
>>> point.
>>
>>
>> Strange again.  I would have expected the same behaviour in python3.8
>> and python3.7, whether successful or unsuccessful.
> 
> 
> Putting dh into mpirun seems to be interfering with process spawning. 
> Once MPI is initialised (for the python3.7 test) it's not reinitialised
> for the python3.8 and so it's in a bad state for the test. Something
> like that.
> 
> It's only in the tests where h5py is invoked that we get the problems.
> This variant works, applying mpirun separately for each test run:
> 
> override_dh_auto_test:
>         set -e; \
>         for py in `py3versions -s -v`; do \
>           mpirun -n 1 pybuild --test -i python{version} -p $$py; \
>         done
> 
> (could use mpirun -n $(NPROC) for real mpi testing).

Yes, it works! \o/

> Do we want to use this as a solution? Or would you prefer an environment
> variable that h5py can check to allow mpi invocation on a serial process?

I let this decision up to you. Whatever you choose it deserve a bit fat
note in README.Debian.

> Note that this means bitshuffle as built now is expressly tied in with
> hdf5-mpi and h5py-mpi (this seems intentional by debian/rules and
> debian/control, though the Build-Depends must be updated to
> python3-h5py-mpi).  It's a separate question whether it's desirable to
> also support a hdf5-serial build of bitshuffle.  Likewise we need to
> think about what we want to happen when bitshuffle is invoked in a
> serial process.

I'll let that to the bitshuffle maintainer. I'll propose a patch to fix
the current FTBFS, sticking on the mpi flavour to be conservative vs
bitshuffle's previous builds.

> I think part of the confusion here is that bitshuffle (at least in the
> tests) is double-handling the HDF5 library, with direct calls on the one
> hand, but indirectly through h5py as well, on the other hand.

Thanks,

_g.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to