Hi Jordi,
Thanks for sharing your initial work on this!
On 12/08/14 08:48, Jordi Blasco wrote:
Hi,
The slurm_job.py relies on the --wrap option of Slurm Workload
Manager. This allows to simplify the complexity of the code quite a lot.
The code is still in very early stage, and needs a clean-up and also
some efforts in the parallelbuild.py.
https://github.com/jordiblasco/easybuild-framework/tree/slurm
Easier to spot what actually changed:
https://github.com/jordiblasco/easybuild-framework/compare/slurm .
The major problem that we are facing is the privileges escalation. We
are using an special user account to install all the applications in
the right place and with the right permissions, but this requires root
privileges, and for that reason I have been looking for quick
alternatives.
Hmm, can you elaborate here? Why would the special user account require
root privileges exactly? W.r.t. privileges escalation, do mean that e.g.
"newgrp - easybuild" and then using "eb --job ..." subsequently doesn't
yield the expected results (i.e., installing something with eb under the
'easybuild' group)?
I have developed an EB command line wrapper that covers most of the
needs that we have in NeSI. It doesn't resolve the dependencies into
different jobs but it allows us to build the applications on all the
architectures at the same time.
In addition to that, it provides some useful features. Thanks to
simple rules in the sudoers, we can submit the jobs as the mentioned
user, and solving this way all the potential conflicts regarding to
the ACLs. It also provides a simple logging system that allows you to
track who installed what and when.
https://github.com/jordiblasco/slurm-utils
I hope it can be helpful.
Regards,
Jordi
On 9 August 2014 06:16, Pablo Escobar Lopez
<[email protected] <mailto:[email protected]>>
wrote:
Hola Miguel :)
as you already mentioned neither LSF or slurm is officially
supported yet, anyway even if it were supported, I would suggest
to start learning how easybuild works without the --job option
because that is not a widely tested option. So I think it´s better
to start learning how easybuild works without submitting to a
scheduler and once you are used to how easybuild works then start
testing with the --job option.
The approach I use to run easybuild in different clusters is to
have a different easybuild config files for each of my clusters
(where I define different paths for install_dir or modules_dir)
and then run the same easyconfig (.eb file) in the different login
nodes using the specific easybuild config file for that cluster.
This way, I write a single easyconfig which I execute in each of
my clusters login nodes so the compilation is optimized for each
machine. Automatizing this is quite simple. If you want more
details about this specific setup just email to the list.
un saludo
Pablo.
2014-08-08 18:08 GMT+02:00 Kenneth Hoste <[email protected]
<mailto:[email protected]>>:
HI Ricardo,
On 08/08/14 17:48, Riccardo Murri wrote:
> Hi Miguel, all,
>
> On 8 August 2014 12:41, Miguel Bernabeu Diaz
<[email protected] <mailto:[email protected]>> wrote:
>> I'm not sure if all or at least the most common schedulers'
CLI could be
>> abstracted in this manner as I've only worked with Slurm
and LSF. Either
>> way, would the community be interested in this kind of
abstraction? Also,
>> has someone worked on something similar or a port to Slurm
or LSF we could
>> extend or reuse?
> We too would probably be interested in batch-system
independence,
> although we're in no hurry. (This would fit in the framework
of a project
> that will only start later on this year.)
I agree this would be a very nice feature indeed. --job is
very useful
for us, but it probably really only works for us.
You basically need Torque + pbs_python (and maybe even align the
versions a bit, to make it worse).
> Actually, if I am allowed a shameless self-plug, we already
have a
> Python framework that can submit and manage jobs on different
> batch-queuing systems, see http://gc3pie.googlecode.com/
That sounds interesting!
Let me pick up a crazy project idea we wrote up some time ago:
https://gist.github.com/boegel/9225891 .
How does gc3pie relate to that?
> I am not familiar with EasyBuild internals, but GC3Pie's job
control
> reduces to a few lines that should be relatively quick to
plug in:
>
> from gc3libs import Application
> from gc3libs.core import Engine
>
> task = Application(['some', '-unix', '+command',
'here'], ...)
> engine = Engine(...)
> engine.add(task)
> # run task and wait for it to finish
> engine.progress()
>
> If there is interest, I can look at the sources and try to
estimate
> how much work it would be to integrate GC3Pie and EasyBuild.
The first step should be to abstract the current support for
--job into
a generic class, and make what's there now derive from that
(probably
naming it PbsPython).
Then, SLURM & LSF could be just another version of that, and
so can
gc3pie and DRMAA.
Unless gc3pie solves all our problems, that would even be
better. ;-)
As the project idea gist shows, supporting different batch
systems is
really a project on its own.
K.
--
Pablo Escobar López
HPC systems engineer
Biozentrum, University of Basel
Swiss Institute of Bioinformatics SIB
Email: [email protected]
<mailto:[email protected]>
Phone: +41 61 267 15 82 <tel:%2B41%2061%20267%2015%2082>
http://www.biozentrum.unibas.ch