Jan,

I agree that the direct Globus tools are not a solution to every problem.
Of course, one needs another layer to take care of brokering jobs across
grid resources, for example.

However, I see in Globus a system that is *almost there* for practical
cluster job submission.   There are just two issues that complicate job
submission for my users:
        1) the lack of a "prologue/epilogue script" in the job submission
        2) generic control of RAM-per-process
I regard these omissions as bugs.

Generally, it is bad policy to add another layer of software to compensate
for bugs in a lower layer.  It is to put bandages on bandages.

On the other hand, if we can fix these middle-layer problems, much better
higher-layer software can be made, much more easily.

Cheers!


On  6.05.08, Jan Ploski wrote:
> Steve White wrote:
> >Jan,
> >
> >I agree with your assessment that the need to adjust the memory use per
> >process is a general one in cluster job submission, and that it is in
> >some way implemented by any underlying job management system, and that
> >these extensions ought not to be PBS-specific.
> >
> >I also looked at your "messy solution".  (The code looks very professional,
> >really.)  It won't do for my purposes, because I need to present a minimal,
> >easily understood solution.
> >
> >Let me explain my situation:
> >
> >None of the compute resources is under my control.  I can point out
> >problems to admins, that is all.
> >
> >I have been assigned two jobs.
> >
> >I and our users are familiar with doing conventional cluster job 
> >submission. One job was to bring them into the grid fold, showing them the 
> >advantages
> >of globusrun-ws.  If it can be shown to be really a cross-platform 
> >solution, giving them the ability to (almost) effortlessly switch 
> >between grid clusters, the effort will be a success.
> >
> >My other job is to write a report on practical MPI job submission over
> >the grid.
> >
> >We have come a long way, but still have to deal with a couple of practical
> >details.  At this point, it looks like both of them will end up as 
> >work-arounds to incomplete implementation of a job submission interface
> >in Globus.
> >
> >If with a future release of Globus, these issues can be dealt with, grid
> >job submission will look very attractive to real researchers.
> 
> Hi,
> 
> Based on my experience with Globus, you might be following a wrong route 
> (the route to disenchantment). I view Globus more as a middleware that 
> has to be adapted (as in: "wrapped around" or "slightly modified") 
> according to your users' needs and which plays an important role behind 
> the scenes, but it probably should not be exposed directly to users as a 
> drop-in replacement for their familiar job submission tools.
> 
> There is a reason for that more important than the limitations you have 
> discovered so far: Globus doesn't ship with command-line job management 
> commands on par with those of TORQUE/Maui, Condor or SGE. If you let 
> users submit jobs with globus-job-submit, the next thing they are going 
> to ask you is "how can I see what jobs I have submitted", "how can I 
> cancel the job or resubmit it elsewhere", "is my job running or not", 
> "why is my job not running", "when is my job going to start", etc.
> 
> You need something in front of Globus to make your users' life bearable. 
> Some projects lean toward application-specific web portals (I think 
> that's AstroGrid's approach). In our project, we have deployed a largely 
> application-agnostic frontend based on Condor-G, but even so there was 
> some customization and some user training required. The Condor-G 
> approach might be relevant for you because it covers the scenario of 
> making a transparent transition from a local batch system to a Grid - 
> the Condor tools for submitting jobs and status querying are pretty much 
> the same regardless of whether your job goes to a machine from a local 
> pool (equivalent to an SGE or PBS-managed cluster) or to a pool of 
> Globus hosts. (In fact, Condor can submit to GT2 [gLite], GT4, Unicore, 
> and some more Grid middlewares.)
> 
> The disadvantage of Condor is that it is a rather huge software product 
> and trying to understand all of it can be daunting. Still, I suppose you 
> could get the Grid submission piece of it running in a couple of hours 
> if you wish to give it a try (by following our tutorials and asking 
> questions where necessary).
> 
> Regards,
> Jan Ploski
> 

-- 
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
Steve White                                             +49(331)7499-202
e-Science / AstroGrid-D                                   Zi. 35  Bg. 20
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
Astrophysikalisches Institut Potsdam (AIP)
An der Sternwarte 16, D-14482 Potsdam

Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz

Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 

Reply via email to