Re: [gt-user] How to set cores-per-node in WS job submission?

Stuart Martin Wed, 07 May 2008 12:38:40 -0700

On May 7, 2008, at May 7, 7:19 AM, Steve White wrote:

Jan,
I agree that the direct Globus tools are not a solution to everyproblem.Of course, one needs another layer to take care of brokering jobsacross
grid resources, for example.

However, I see in Globus a system that is *almost there* for practical
cluster job submission. There are just two issues that complicatejob
submission for my users:


Steve,

These are things we'd like to do, but have not been able to get to them.


        1) the lack of a "prologue/epilogue script" in the job submission

We had a prototype of a GRAM service java RM API that included aprologue and epilogue callouts, but we have not been able to give itattention to get it in release form.

        2) generic control of RAM-per-process

We would like to move to JSDL where I would think this would becovered, but after scanning, it looks like it isn't. jsdl posix hasMemoryLimit, but that is for the job and not for each process in thejob. So I don't think even JSDL provides this.


From the JSDL 1.0 doc
>>>>
jsdl-posix:MemoryLimit

8.1.14 MemoryLimit Element

8.1.14.1 Definition

This element is a positive integer that describes the maximum amountof physical memory thatthe job should use when executing. The amount is given in bytes. Ifthis is not present then the

consuming system MAY choose its default value10.
<<<<

I regard these omissions as bugs.
Generally, it is bad policy to add another layer of software tocompensate
for bugs in a lower layer.  It is to put bandages on bandages.
On the other hand, if we can fix these middle-layer problems, muchbetter
higher-layer software can be made, much more easily.

Cheers!


On  6.05.08, Jan Ploski wrote:
Steve White wrote:
Jan,
I agree with your assessment that the need to adjust the memoryuse perprocess is a general one in cluster job submission, and that it isinsome way implemented by any underlying job management system, andthat
these extensions ought not to be PBS-specific.
I also looked at your "messy solution". (The code looks veryprofessional,really.) It won't do for my purposes, because I need to present aminimal,
easily understood solution.

Let me explain my situation:

None of the compute resources is under my control.  I can point out
problems to admins, that is all.

I have been assigned two jobs.

I and our users are familiar with doing conventional cluster job
submission. One job was to bring them into the grid fold, showingthem the
advantages
of globusrun-ws.  If it can be shown to be really a cross-platform
solution, giving them the ability to (almost) effortlessly switch
between grid clusters, the effort will be a success.
My other job is to write a report on practical MPI job submissionover
the grid.
We have come a long way, but still have to deal with a couple ofpractical
details.  At this point, it looks like both of them will end up as
work-arounds to incomplete implementation of a job submissioninterface
in Globus.
If with a future release of Globus, these issues can be dealtwith, grid
job submission will look very attractive to real researchers.
Hi,
Based on my experience with Globus, you might be following a wrongroute(the route to disenchantment). I view Globus more as a middlewarethat
has to be adapted (as in: "wrapped around" or "slightly modified")
according to your users' needs and which plays an important rolebehindthe scenes, but it probably should not be exposed directly to usersas a
drop-in replacement for their familiar job submission tools.
There is a reason for that more important than the limitations youhavediscovered so far: Globus doesn't ship with command-line jobmanagement
commands on par with those of TORQUE/Maui, Condor or SGE. If you let
users submit jobs with globus-job-submit, the next thing they aregoing
to ask you is "how can I see what jobs I have submitted", "how can I
cancel the job or resubmit it elsewhere", "is my job running or not",
"why is my job not running", "when is my job going to start", etc.
You need something in front of Globus to make your users' lifebearable.
Some projects lean toward application-specific web portals (I think
that's AstroGrid's approach). In our project, we have deployed alargelyapplication-agnostic frontend based on Condor-G, but even so therewas
some customization and some user training required. The Condor-G
approach might be relevant for you because it covers the scenario of
making a transparent transition from a local batch system to a Grid -
the Condor tools for submitting jobs and status querying are prettymuchthe same regardless of whether your job goes to a machine from alocal
pool (equivalent to an SGE or PBS-managed cluster) or to a pool of
Globus hosts. (In fact, Condor can submit to GT2 [gLite], GT4,Unicore,
and some more Grid middlewares.)
The disadvantage of Condor is that it is a rather huge softwareproductand trying to understand all of it can be daunting. Still, Isuppose youcould get the Grid submission piece of it running in a couple ofhours
if you wish to give it a try (by following our tutorials and asking
questions where necessary).

Regards,
Jan Ploski
--
- - - - - - - - - - - - - - - - - - - - - - -- -Steve White+49(331)7499-202e-Science / AstroGrid-D Zi. 35Bg. 20- - - - - - - - - - - - - - - - - - - - - - -- -
Astrophysikalisches Institut Potsdam (AIP)
An der Sternwarte 16, D-14482 Potsdam

Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz
Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026- - - - - - - - - - - - - - - - - - - - - - -- -

Re: [gt-user] How to set cores-per-node in WS job submission?

Reply via email to