Hi Renato,
Its not increasing memory, but if I say I need mem=6gb or pmem=6gb, it still goes to the node with total memory less than 6gb. So I thought by setting the NODEAVAILABILITYPOLICY, I will be able to define availability on the bases of memory. Like we define np= in nodes file, do we have to define memory resources too?
Thanks,
Abhi.


Renato Borges wrote:
Hi Abhi!

On Wed, Dec 15, 2010 at 7:21 PM, Abhishek Gupta <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    I am trying to figure out the way so that memory usage does not exceed
    the available memory on a node. I was thinking that this parameter (
    NODEAVAILABILITYPOLICY COMBINED:MEM ) should check the availability of
    node on the bases of memory available, but it does not.
    Is there anything else I need to add to make it work?
    NODEAVAILABILITYPOLICY COMBINED:MEM

    Thanks,
    Abhi.


I´ve never used NODEAVAILABILITYPOLICY, but I have a similar problem, which is: the jobs we run at my site start out with a small memory footprint, and end with large amounts of data in memory (in virtualization lingo, they "balloon"). Maybe this is also your case, and this is why setting this variable doesn`t work?

To avoid swapping, I have set a MAXJOBPERUSER variable for each compute node, because all of our jobs that have an increasing memory footprint come from a single user (actually, a grid account).

Tweaking the MAXJOBPERUSER variable, I have found a value for each node (we have an heterogeneous cluster) that runs the jobs without swapping.

However, this is not ideal because this setting is applied to all jobs that run on a given node, and some local users have jobs that are small in memory, but large in number of cores, and the limits which I set for the grid jobs are too restrictive for them. Whereas a grid job can only run 4 jobs on a 8 core, 8GB RAM node, local user´s jobs could merrily run on all 8 cores simultaneously.

Trying to find a better solution, I found that one can set on torque (supposing you use torque):

qmgr -c "set queue XXX resources_min.mem=2000kb"

And this would (theoretically) only attribute nodes that have at least 2GB of free memory to waiting jobs on XXX queue. I say "theoretically" because I have not had luck with this setting. As I said, our grid jobs balloon, and so our nodes get one job per slot, since initially (for the first few hours) the jobs are only downloading data, and so there is always 2GB free. But when the memories ballon, we start swapping heavily.

I guess that you might have more luck with that if your jobs´ memory footprint is more constant, or if some guru could teach us how to "reserve" some memory amount per job, I know that would suit me perfectly.

Cheers,
Renato.
--
Renato Callado Borges
Lab Specialist - DFN/IF/USP
Email: [email protected] <mailto:[email protected]>
Phone: +55 11 3091 7105
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to