Jan,
I agree that the direct Globus tools are not a solution to every problem.
Of course, one needs another layer to take care of brokering jobs across
grid resources, for example.
However, I see in Globus a system that is *almost there* for practical
cluster job submission. There are just two issues that complicate job
submission for my users:
1) the lack of a "prologue/epilogue script" in the job submission
2) generic control of RAM-per-process
I regard these omissions as bugs.
Generally, it is bad policy to add another layer of software to compensate
for bugs in a lower layer. It is to put bandages on bandages.
On the other hand, if we can fix these middle-layer problems, much better
higher-layer software can be made, much more easily.
Cheers!
On 6.05.08, Jan Ploski wrote:
> Steve White wrote:
> >Jan,
> >
> >I agree with your assessment that the need to adjust the memory use per
> >process is a general one in cluster job submission, and that it is in
> >some way implemented by any underlying job management system, and that
> >these extensions ought not to be PBS-specific.
> >
> >I also looked at your "messy solution". (The code looks very professional,
> >really.) It won't do for my purposes, because I need to present a minimal,
> >easily understood solution.
> >
> >Let me explain my situation:
> >
> >None of the compute resources is under my control. I can point out
> >problems to admins, that is all.
> >
> >I have been assigned two jobs.
> >
> >I and our users are familiar with doing conventional cluster job
> >submission. One job was to bring them into the grid fold, showing them the
> >advantages
> >of globusrun-ws. If it can be shown to be really a cross-platform
> >solution, giving them the ability to (almost) effortlessly switch
> >between grid clusters, the effort will be a success.
> >
> >My other job is to write a report on practical MPI job submission over
> >the grid.
> >
> >We have come a long way, but still have to deal with a couple of practical
> >details. At this point, it looks like both of them will end up as
> >work-arounds to incomplete implementation of a job submission interface
> >in Globus.
> >
> >If with a future release of Globus, these issues can be dealt with, grid
> >job submission will look very attractive to real researchers.
>
> Hi,
>
> Based on my experience with Globus, you might be following a wrong route
> (the route to disenchantment). I view Globus more as a middleware that
> has to be adapted (as in: "wrapped around" or "slightly modified")
> according to your users' needs and which plays an important role behind
> the scenes, but it probably should not be exposed directly to users as a
> drop-in replacement for their familiar job submission tools.
>
> There is a reason for that more important than the limitations you have
> discovered so far: Globus doesn't ship with command-line job management
> commands on par with those of TORQUE/Maui, Condor or SGE. If you let
> users submit jobs with globus-job-submit, the next thing they are going
> to ask you is "how can I see what jobs I have submitted", "how can I
> cancel the job or resubmit it elsewhere", "is my job running or not",
> "why is my job not running", "when is my job going to start", etc.
>
> You need something in front of Globus to make your users' life bearable.
> Some projects lean toward application-specific web portals (I think
> that's AstroGrid's approach). In our project, we have deployed a largely
> application-agnostic frontend based on Condor-G, but even so there was
> some customization and some user training required. The Condor-G
> approach might be relevant for you because it covers the scenario of
> making a transparent transition from a local batch system to a Grid -
> the Condor tools for submitting jobs and status querying are pretty much
> the same regardless of whether your job goes to a machine from a local
> pool (equivalent to an SGE or PBS-managed cluster) or to a pool of
> Globus hosts. (In fact, Condor can submit to GT2 [gLite], GT4, Unicore,
> and some more Grid middlewares.)
>
> The disadvantage of Condor is that it is a rather huge software product
> and trying to understand all of it can be daunting. Still, I suppose you
> could get the Grid submission piece of it running in a couple of hours
> if you wish to give it a try (by following our tutorials and asking
> questions where necessary).
>
> Regards,
> Jan Ploski
>
--
- - - - - - - - - - - - - - - - - - - - - - - - -
Steve White +49(331)7499-202
e-Science / AstroGrid-D Zi. 35 Bg. 20
- - - - - - - - - - - - - - - - - - - - - - - - -
Astrophysikalisches Institut Potsdam (AIP)
An der Sternwarte 16, D-14482 Potsdam
Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz
Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026
- - - - - - - - - - - - - - - - - - - - - - - - -