Re: [gt-user] How to set cores-per-node in WS job submission?

Steve White Thu, 08 May 2008 03:36:50 -0700

Stuart

On  7.05.08, Stuart Martin wrote:
> On May 7, 2008, at May 7, 7:19 AM, Steve White wrote:
> 
> These are things we'd like to do, but have not been able to get to them.
> 
> >
> >     1) the lack of a "prologue/epilogue script" in the job submission
> 
> We had a prototype of a GRAM service java RM API that included a  
> prologue and epilogue callouts, but we have not been able to give it  
> attention to get it in release form.
>
I'm not sure I understand "RM API" and "callouts".


I have expanded on my idea of a good user interface (for JDD anyway)
in Jan's bug report: http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5698

> >     2) generic control of RAM-per-process
> 
> We would like to move to JSDL where I would think this would be  
> covered, but after scanning, it looks like it isn't.  jsdl posix has  
> MemoryLimit, but that is for the job and not for each process in the  
> job.  So I don't think even JSDL provides this.
> 

This would suffice if implemented properly.  

The memory per process would be 
        mem_per_process = MemoryLimit / count

The number of cores to assign per node on cluster with multi-core nodes
could be calculated as

        available_RAM_per_node / mem_per_process

Cheers!

> From the JSDL 1.0 doc
> >>>>
> jsdl-posix:MemoryLimit
> 
> 8.1.14 MemoryLimit Element
> 
> 8.1.14.1 Definition
> This element is a positive integer that describes the maximum amount  
> of physical memory that
> the job should use when executing. The amount is given in bytes. If  
> this is not present then the
> consuming system MAY choose its default value10.
> <<<<
> 
> >
> >I regard these omissions as bugs.
> >
> >
> >Generally, it is bad policy to add another layer of software to  
> >compensate
> >for bugs in a lower layer.  It is to put bandages on bandages.
> >
> >On the other hand, if we can fix these middle-layer problems, much  
> >better
> >higher-layer software can be made, much more easily.
> >
> >Cheers!
> >
> >
> >On  6.05.08, Jan Ploski wrote:
> >>Steve White wrote:
> >>>Jan,
> >>>
> >>>I agree with your assessment that the need to adjust the memory  
> >>>use per
> >>>process is a general one in cluster job submission, and that it is  
> >>>in
> >>>some way implemented by any underlying job management system, and  
> >>>that
> >>>these extensions ought not to be PBS-specific.
> >>>
> >>>I also looked at your "messy solution".  (The code looks very  
> >>>professional,
> >>>really.)  It won't do for my purposes, because I need to present a  
> >>>minimal,
> >>>easily understood solution.
> >>>
> >>>Let me explain my situation:
> >>>
> >>>None of the compute resources is under my control.  I can point out
> >>>problems to admins, that is all.
> >>>
> >>>I have been assigned two jobs.
> >>>
> >>>I and our users are familiar with doing conventional cluster job
> >>>submission. One job was to bring them into the grid fold, showing  
> >>>them the
> >>>advantages
> >>>of globusrun-ws.  If it can be shown to be really a cross-platform
> >>>solution, giving them the ability to (almost) effortlessly switch
> >>>between grid clusters, the effort will be a success.
> >>>
> >>>My other job is to write a report on practical MPI job submission  
> >>>over
> >>>the grid.
> >>>
> >>>We have come a long way, but still have to deal with a couple of  
> >>>practical
> >>>details.  At this point, it looks like both of them will end up as
> >>>work-arounds to incomplete implementation of a job submission  
> >>>interface
> >>>in Globus.
> >>>
> >>>If with a future release of Globus, these issues can be dealt  
> >>>with, grid
> >>>job submission will look very attractive to real researchers.
> >>
> >>Hi,
> >>
> >>Based on my experience with Globus, you might be following a wrong  
> >>route
> >>(the route to disenchantment). I view Globus more as a middleware  
> >>that
> >>has to be adapted (as in: "wrapped around" or "slightly modified")
> >>according to your users' needs and which plays an important role  
> >>behind
> >>the scenes, but it probably should not be exposed directly to users  
> >>as a
> >>drop-in replacement for their familiar job submission tools.
> >>
> >>There is a reason for that more important than the limitations you  
> >>have
> >>discovered so far: Globus doesn't ship with command-line job  
> >>management
> >>commands on par with those of TORQUE/Maui, Condor or SGE. If you let
> >>users submit jobs with globus-job-submit, the next thing they are  
> >>going
> >>to ask you is "how can I see what jobs I have submitted", "how can I
> >>cancel the job or resubmit it elsewhere", "is my job running or not",
> >>"why is my job not running", "when is my job going to start", etc.
> >>
> >>You need something in front of Globus to make your users' life  
> >>bearable.
> >>Some projects lean toward application-specific web portals (I think
> >>that's AstroGrid's approach). In our project, we have deployed a  
> >>largely
> >>application-agnostic frontend based on Condor-G, but even so there  
> >>was
> >>some customization and some user training required. The Condor-G
> >>approach might be relevant for you because it covers the scenario of
> >>making a transparent transition from a local batch system to a Grid -
> >>the Condor tools for submitting jobs and status querying are pretty  
> >>much
> >>the same regardless of whether your job goes to a machine from a  
> >>local
> >>pool (equivalent to an SGE or PBS-managed cluster) or to a pool of
> >>Globus hosts. (In fact, Condor can submit to GT2 [gLite], GT4,  
> >>Unicore,
> >>and some more Grid middlewares.)
> >>
> >>The disadvantage of Condor is that it is a rather huge software  
> >>product
> >>and trying to understand all of it can be daunting. Still, I  
> >>suppose you
> >>could get the Grid submission piece of it running in a couple of  
> >>hours
> >>if you wish to give it a try (by following our tutorials and asking
> >>questions where necessary).
> >>
> >>Regards,
> >>Jan Ploski
> >>
> >
> >-- 
> >-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   
> >-  -
> >Steve White                                              
> >+49(331)7499-202
> >e-Science / AstroGrid-D                                   Zi. 35   
> >Bg. 20
> >-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   
> >-  -
> >Astrophysikalisches Institut Potsdam (AIP)
> >An der Sternwarte 16, D-14482 Potsdam
> >
> >Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz
> >
> >Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/ 
> >7-71-026
> >-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   
> >-  -
> >
> 

-- 
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
Steve White                                             +49(331)7499-202
e-Science / AstroGrid-D                                   Zi. 35  Bg. 20
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
Astrophysikalisches Institut Potsdam (AIP)
An der Sternwarte 16, D-14482 Potsdam

Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz

Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -

Re: [gt-user] How to set cores-per-node in WS job submission?

Reply via email to