Thanks Gabe for commenting back and appreciate your comments! Cheers, Chris
On Apr 17, 2012, at 9:25 AM, Resneck, Gabriel M (388J) wrote: > Chris, is right. The concepts of Load and Capacity do have meaning, and if I > sent the impression that these constructs are meaningless then that's my bad. > What I was trying to convey in my explanation was simply that Load and > Capacity have no automatic relation to the resources required by jobs or > available in nodes, and in order to provide that relation you must configure > load and capacity according to levels that the user sees as appropriate. > > Gabe =) > > ________________________________________ > From: Mattmann, Chris A (388J) [[email protected]] > Sent: Monday, April 16, 2012 8:27 PM > To: <[email protected]> > Cc: Wong, Cynthia L (388J) > Subject: Re: Capacity vs Load in Resource Manager > > Hi Gabe, > > On Apr 16, 2012, at 11:44 AM, Resneck, Gabriel M (388J) wrote: > >> >> To use Chris's words, when using the "fresh-out-of-the-box" version of the >> RM, both of the concepts of Capacity and Load are entirely arbitrary. > > I'd clarify that while the default values set for these concepts are > arbitrary, the concepts themselves are not. Capacity is used > by the AssignmentMonitor and is a core property of the ResourceNode class. > Load, is leveraged by the AssignmentMonitor > to determine the current business of one of the ResourceNodes. > >> They have no relation to any kind of resources available on your node >> machines. > > Well, again, the default out of the box values for these concepts don't, but > the concepts themselves do. > >> Therefore, if you give each job a load of 1 (regardless of the node >> resources required to run the job) and if you give a node a capacity of 10, >> the RM will try to always have 10 jobs running on that node. >> It does nothing to track resource usage on the node, so use of such a >> paradigm as the one that I just described could be wildly inefficient. > > Let's clarify that again. Saying it *does nothing* kind of doesn't sound > right to me. It *does* do something. It tracks how > much load is currently on a node, compared to its current capacity, and > provides that information as-is to the Scheduler, > which then in turn uses the information to determine a node "besting" > algorithm to determine what node to select to > Batch a job out to. So, it does *do something*. It's just that it's not > real-time and more virtual profiling. And, let's be specific. > The XMLAssignmentMonitor decides how this information will be used and > provided and tracked. This is just one > potential implementation of the AssignmentMonitor RM extension point. > > We could (and should) develop a Ganglia resource monitor that could leverage > Ganglia information to plug in. And > we could develop a TorqueAssignmentMonitor that uses qmon or something like > it to parse the information out of > Torque's queue. We could also connect in to Sun Grid Engine (SGE) or another > DRM technology to get this > information too. > > >> Because these numbers are arbitrary, I recommend carefully investigating the >> availability of resources on your nodes and setting load and capacity levels >> using that information. For example, if you find that your jobs tend to be >> I/O bound when you have more than 3 running simultaneously on the same node, >> then you could set your job load to 1 and the node capacity to 3. If you >> wanted more granularity, you could easily set the load to 33 and the >> capacity to 100. Since these numbers are entirely arbitrary, you have the >> freedom to make such changes. Obviously, not all jobs will be the same, so >> you may want to assign different loads to different jobs and assign >> different capacities to nodes based upon the resources that each makes >> available. > > Exactly. And to add to that, you can group different jobs into different > queues, and then queues to nodes, to control flow of jobs > onto those nodes, based on a "queue type". > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
