All, Thank you for your replies. I'll check these out and may bring in more questions.
Regards, Cynthia Cynthia Wong-- Cynthia L. Wong Data Management Systems and Technologies Jet Propulsion Laboratory 4800 Oak Grove Drive, M/S 171-264, Pasadena, CA 91109-8099 Phone: 818/393-2572, Email: [email protected] On 4/16/12 8:27 PM, "Mattmann, Chris A (388J)" <[email protected]> wrote: >Hi Gabe, > >On Apr 16, 2012, at 11:44 AM, Resneck, Gabriel M (388J) wrote: > >> >> To use Chris's words, when using the "fresh-out-of-the-box" version of >>the RM, both of the concepts of Capacity and Load are entirely >>arbitrary. > >I'd clarify that while the default values set for these concepts are >arbitrary, the concepts themselves are not. Capacity is used >by the AssignmentMonitor and is a core property of the ResourceNode >class. Load, is leveraged by the AssignmentMonitor >to determine the current business of one of the ResourceNodes. > >> They have no relation to any kind of resources available on your node >>machines. > >Well, again, the default out of the box values for these concepts don't, >but the concepts themselves do. > >> Therefore, if you give each job a load of 1 (regardless of the node >>resources required to run the job) and if you give a node a capacity of >>10, the RM will try to always have 10 jobs running on that node. >> It does nothing to track resource usage on the node, so use of such a >>paradigm as the one that I just described could be wildly inefficient. > >Let's clarify that again. Saying it *does nothing* kind of doesn't sound >right to me. It *does* do something. It tracks how >much load is currently on a node, compared to its current capacity, and >provides that information as-is to the Scheduler, >which then in turn uses the information to determine a node "besting" >algorithm to determine what node to select to >Batch a job out to. So, it does *do something*. It's just that it's not >real-time and more virtual profiling. And, let's be specific. >The XMLAssignmentMonitor decides how this information will be used and >provided and tracked. This is just one >potential implementation of the AssignmentMonitor RM extension point. > >We could (and should) develop a Ganglia resource monitor that could >leverage Ganglia information to plug in. And >we could develop a TorqueAssignmentMonitor that uses qmon or something >like it to parse the information out of >Torque's queue. We could also connect in to Sun Grid Engine (SGE) or >another DRM technology to get this >information too. > > >> Because these numbers are arbitrary, I recommend carefully >>investigating the availability of resources on your nodes and setting >>load and capacity levels using that information. For example, if you >>find that your jobs tend to be I/O bound when you have more than 3 >>running simultaneously on the same node, then you could set your job >>load to 1 and the node capacity to 3. If you wanted more granularity, >>you could easily set the load to 33 and the capacity to 100. Since >>these numbers are entirely arbitrary, you have the freedom to make such >>changes. Obviously, not all jobs will be the same, so you may want to >>assign different loads to different jobs and assign different capacities >>to nodes based upon the resources that each makes available. > >Exactly. And to add to that, you can group different jobs into different >queues, and then queues to nodes, to control flow of jobs >onto those nodes, based on a "queue type". > >Cheers, >Chris > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Senior Computer Scientist >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >Office: 171-266B, Mailstop: 171-246 >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Adjunct Assistant Professor, Computer Science Department >University of Southern California, Los Angeles, CA 90089 USA >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >
