On Tue, Oct 29, 2013 at 9:32 AM, Cody Permann <codyperm...@gmail.com> wrote:
>
> On Tue, Oct 29, 2013 at 5:54 AM, ernestol <ernes...@lncc.br> wrote:
>
> > I am using an cluster with 23 node for a total of 184 cores, and each
> node
> > additionally has 16GB of RAM. I was thinking that the problem maybe is in
> > the code. Because if I run at up to 3 processors I dont have any problens
> > but when I try with 4 or more I get this problem.
>
So you have 8 cores per node, and 2 GB of RAM per core, which is pretty
standard.
I ran your 200^3 code on my Mac workstation and watched the memory usage in
Activity Monitor.
The results were somewhat surprising as I added cores:
1 core: 2.22 Gb/core
2 cores: 4.0 Gb/core
3 cores: slightly more than 4.0 Gb/core
4 cores: machine went into swap (I think) after approaching about 3.5
Gb/core but code eventually finished
5 cores: machine again went into swap at around 3.3 Gb/core but finished
eventually
My workstation has 20 Gb of RAM, so including the OS I guess I could see
how approaching 16Gb might cause it to go into swap.
But, what is happening when we go from 1 to 2 cores that causes the memory
usage per core to double?!
Note that in all cases the memory quickly jumps to about 2.22 Gb core. In
the 1 processor case it stays there, but in the 2-5 processor cases, after
reaching 2Gb/core, it slowly ramps up to the approximately 4 Gb/core listed
above.
This, combined with the error message you received (which comes from Metis)
leads me to believe that the partitioner is taking up a ton of memory
(partitioner doesn't run on 1 proc). So the questions become:
1.) Is the partitioner taking up a lot more memory than it conceivably
should? (Seems like yes.)
2.) Is it taking up more than it used to? I.e., has a bug been introduced
recently (Metis and Parmetis were last updated in April 2013, so pretty
recently actually)
I don't know about reverting to a prior version of Metis/Parmetis is easily
done at this point, but the relevant hashes where the refresh happened seem
to be:
e80824e86a
1c4b6a0d12
I may take a stab at this after lunch... Cody has been seeing similar
issues recently as well.
--
John
------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel