On Tue, Oct 29, 2013 at 11:19 AM, John Peterson <jwpeter...@gmail.com>wrote:
> On Tue, Oct 29, 2013 at 9:32 AM, Cody Permann <codyperm...@gmail.com>wrote:
>
>>
>> On Tue, Oct 29, 2013 at 5:54 AM, ernestol <ernes...@lncc.br> wrote:
>>
>> > I am using an cluster with 23 node for a total of 184 cores, and each
>> node
>> > additionally has 16GB of RAM. I was thinking that the problem maybe is
>> in
>> > the code. Because if I run at up to 3 processors I dont have any
>> problens
>> > but when I try with 4 or more I get this problem.
>>
>
> So you have 8 cores per node, and 2 GB of RAM per core, which is pretty
> standard.
>
> I ran your 200^3 code on my Mac workstation and watched the memory usage
> in Activity Monitor.
>
> The results were somewhat surprising as I added cores:
>
> 1 core: 2.22 Gb/core
> 2 cores: 4.0 Gb/core
> 3 cores: slightly more than 4.0 Gb/core
> 4 cores: machine went into swap (I think) after approaching about 3.5
> Gb/core but code eventually finished
> 5 cores: machine again went into swap at around 3.3 Gb/core but finished
> eventually
>
> My workstation has 20 Gb of RAM, so including the OS I guess I could see
> how approaching 16Gb might cause it to go into swap.
>
> But, what is happening when we go from 1 to 2 cores that causes the memory
> usage per core to double?!
>
> Note that in all cases the memory quickly jumps to about 2.22 Gb core. In
> the 1 processor case it stays there, but in the 2-5 processor cases, after
> reaching 2Gb/core, it slowly ramps up to the approximately 4 Gb/core listed
> above.
>
> This, combined with the error message you received (which comes from
> Metis) leads me to believe that the partitioner is taking up a ton of
> memory (partitioner doesn't run on 1 proc). So the questions become:
>
> 1.) Is the partitioner taking up a lot more memory than it conceivably
> should? (Seems like yes.)
> 2.) Is it taking up more than it used to? I.e., has a bug been introduced
> recently (Metis and Parmetis were last updated in April 2013, so pretty
> recently actually)
>
> I don't know about reverting to a prior version of Metis/Parmetis is
> easily done at this point, but the relevant hashes where the refresh
> happened seem to be:
>
> e80824e86a
> 1c4b6a0d12
>
> I may take a stab at this after lunch... Cody has been seeing similar
> issues recently as well.
>
I confirmed that changing the partitioner does seem to reduce the overall
memory usage appreciably.
Linear Partitioner
1 core: 2.22 Gb/core
2 cores: about 2.7 Gb/core peak
3 cores: same as 2 cores
4 cores: about 2.6 Gb/core
CentroidPartitioner
1 core: 2.22
2 cores: about 3 Gb/core peak
4 cores: about 2.8 Gb/core peak
SFCPartitioner
1 core: 2.22
2 cores: slightly > 3 Gb/core peak
4 cores: almost exactly the same Gb/core as 2 cores case
Using the Activity Monitor does not provide a huge amount of accuracy, but
I think the trends are about the same for the Linear, Centroid, and SFC
partitioners, and make a lot more sense than the Metis results. In
particular, I was able to run on 4 cores without going into swap.
--
John
------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel