Thanks Gabe for commenting back and appreciate your comments!

Cheers,
Chris

On Apr 17, 2012, at 9:25 AM, Resneck, Gabriel M (388J) wrote:

> Chris, is right.  The concepts of Load and Capacity do have meaning, and if I 
> sent the impression that these constructs are meaningless then that's my bad.
> What I was trying to convey in my explanation was simply that Load and 
> Capacity have no automatic relation to the resources required by jobs or 
> available in nodes, and in order to provide that relation you must configure 
> load and capacity according to levels that the user sees as appropriate.
> 
> Gabe =)
> 
> ________________________________________
> From: Mattmann, Chris A (388J) [[email protected]]
> Sent: Monday, April 16, 2012 8:27 PM
> To: <[email protected]>
> Cc: Wong, Cynthia L (388J)
> Subject: Re: Capacity vs Load in Resource Manager
> 
> Hi Gabe,
> 
> On Apr 16, 2012, at 11:44 AM, Resneck, Gabriel M (388J) wrote:
> 
>> 
>> To use Chris's words, when using the "fresh-out-of-the-box" version of the 
>> RM, both of the concepts of Capacity and Load are entirely arbitrary.
> 
> I'd clarify that while the default values set for these concepts are 
> arbitrary, the concepts themselves are not. Capacity is used
> by the AssignmentMonitor and is a core property of the ResourceNode class. 
> Load, is leveraged by the AssignmentMonitor
> to determine the current business of one of the ResourceNodes.
> 
>> They have no relation to any kind of resources available on your node 
>> machines.
> 
> Well, again, the default out of the box values for these concepts don't, but 
> the concepts themselves do.
> 
>> Therefore, if you give each job a load of 1 (regardless of the node 
>> resources required to run the job) and if you give a node a capacity of 10, 
>> the RM will try to always have 10 jobs running on that node.
>> It does nothing to track resource usage on the node, so use of such a 
>> paradigm as the one that I just described could be wildly inefficient.
> 
> Let's clarify that again. Saying it *does nothing* kind of doesn't sound 
> right to me. It *does* do something. It tracks how
> much load is currently on a node, compared to its current capacity, and 
> provides that information as-is to the Scheduler,
> which then in turn uses the information to determine a node "besting" 
> algorithm to determine what node to select to
> Batch a job out to. So, it does *do something*. It's just that it's not 
> real-time and more virtual profiling. And, let's be specific.
> The XMLAssignmentMonitor decides how this information will be used and 
> provided and tracked. This is just one
> potential implementation of the AssignmentMonitor RM extension point.
> 
> We could (and should) develop a Ganglia resource monitor that could leverage 
> Ganglia information to plug in. And
> we could develop a TorqueAssignmentMonitor that uses qmon or something like 
> it to parse the information out of
> Torque's queue. We could also connect in to Sun Grid Engine (SGE) or another 
> DRM technology to get this
> information too.
> 
> 
>> Because these numbers are arbitrary, I recommend carefully investigating the 
>> availability of resources on your nodes and setting load and capacity levels 
>> using that information.  For example, if you find that your jobs tend to be 
>> I/O bound when you have more than 3 running simultaneously on the same node, 
>> then you could set your job load to 1 and the node capacity to 3.  If you 
>> wanted more granularity, you could easily set the load to 33 and the 
>> capacity to 100.  Since these numbers are entirely arbitrary, you have the 
>> freedom to make such changes.  Obviously, not all jobs will be the same, so 
>> you may want to assign different loads to different jobs and assign 
>> different capacities to nodes based upon the resources that each makes 
>> available.
> 
> Exactly. And to add to that, you can group different jobs into different 
> queues, and then queues to nodes, to control flow of jobs
> onto those nodes, based on a "queue type".
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to