Åke Sandgren <[email protected]> writes:

> On 12/5/19 11:40 AM, Loris Bennett wrote:
>> I have tried this with 
>> 
>>   #!/bin/bash
>> 
>>   #SBATCH --job-name=easybuild_gpu
>>   #SBATCH --ntasks=4
>>   #SBATCH --time=12:00:00
>>   #SBATCH --mem-per-cpu=1G
>>   #SBATCH --partition=gpu
>>   #SBATCH --qos=medium
>> 
>>   srun eb Keras-2.2.4-fosscuda-2019a-Python-3.7.2.eb --robot
>
> Drop the srun part. You don't want to start 4 eb's doing the same thing.
> That may be the reason for your error.

Doesn't Easybuild use the number of cores available for parallel make? 

>> but get the error
>> 
>>   == FAILED: Installation ended unsuccessfully (build directory:
>> /trinity/shared/easybuild/build/TensorFlow/1.13.1/fosscuda-2019a-Python-3.7.2):
>> build failed (first 300 chars): Failed to chmod/chown several paths:
>> ['/trinity/shared/easybuild/build/TensorFlow/1.13.1/fosscuda-2019a-Python-3.7.2',
>> '/trinity/shared/easybuild/build/TensorFlow/1.13.1/fosscuda-2019a-Python-3.7.2/protobufpython',
>> '/trinity/shared/easybuild/build/TensorFlow/1.13.1/fosscuda-2019a-Python-3.7.2/abslpy
>> (took 4 sec)
>> 
>> I'm running the Slurm job as the same user I use always to run
>> Easybuild, so all the above directories are already owned by that user.
>> 
>> Any ideas about what I might be doing wrong?
>
> You need to look in the log file to see what the actual error is, the
> summary just tells you something went wrong.

The actual error is 

  last error: [Errno 30] Read-only file system:
  
'/trinity/shared/easybuild/build/TensorFlow/1.13.1/fosscuda-2019a-Python-3.7.2/TensorFlow/tensorflow-1.13.1/tools/python_bin_path.sh'

Indeed the NFS directory containing Easybuild and all the software was
mounted read-only on the compute nodes.

So I remounted read-write, but I still get the same error :-/

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to