On Tue, Oct 29, 2013 at 12:31 PM, John Peterson <jwpeter...@gmail.com>wrote:
>
>
>
> On Tue, Oct 29, 2013 at 11:19 AM, John Peterson <jwpeter...@gmail.com>wrote:
>
>> On Tue, Oct 29, 2013 at 9:32 AM, Cody Permann <codyperm...@gmail.com>wrote:
>>
>>>
>>> On Tue, Oct 29, 2013 at 5:54 AM, ernestol <ernes...@lncc.br> wrote:
>>>
>>> > I am using an cluster with 23 node for a total of 184 cores, and each
>>> node
>>> > additionally has 16GB of RAM. I was thinking that the problem maybe is
>>> in
>>> > the code. Because if I run at up to 3 processors I dont have any
>>> problens
>>> > but when I try with 4 or more I get this problem.
>>>
>>
>> So you have 8 cores per node, and 2 GB of RAM per core, which is pretty
>> standard.
>>
>> I ran your 200^3 code on my Mac workstation and watched the memory usage
>> in Activity Monitor.
>>
>> The results were somewhat surprising as I added cores:
>>
>> 1 core: 2.22 Gb/core
>> 2 cores: 4.0 Gb/core
>> 3 cores: slightly more than 4.0 Gb/core
>> 4 cores: machine went into swap (I think) after approaching about 3.5
>> Gb/core but code eventually finished
>> 5 cores: machine again went into swap at around 3.3 Gb/core but finished
>> eventually
>>
>> My workstation has 20 Gb of RAM, so including the OS I guess I could see
>> how approaching 16Gb might cause it to go into swap.
>>
>> But, what is happening when we go from 1 to 2 cores that causes the
>> memory usage per core to double?!
>>
>> Note that in all cases the memory quickly jumps to about 2.22 Gb core.
>> In the 1 processor case it stays there, but in the 2-5 processor cases,
>> after reaching 2Gb/core, it slowly ramps up to the approximately 4 Gb/core
>> listed above.
>>
>> This, combined with the error message you received (which comes from
>> Metis) leads me to believe that the partitioner is taking up a ton of
>> memory (partitioner doesn't run on 1 proc). So the questions become:
>>
>> 1.) Is the partitioner taking up a lot more memory than it conceivably
>> should? (Seems like yes.)
>> 2.) Is it taking up more than it used to? I.e., has a bug been
>> introduced recently (Metis and Parmetis were last updated in April 2013, so
>> pretty recently actually)
>>
>> I don't know about reverting to a prior version of Metis/Parmetis is
>> easily done at this point, but the relevant hashes where the refresh
>> happened seem to be:
>>
>> e80824e86a
>> 1c4b6a0d12
>>
>> I may take a stab at this after lunch... Cody has been seeing similar
>> issues recently as well.
>>
>
>
> I confirmed that changing the partitioner does seem to reduce the overall
> memory usage appreciably.
>
> Linear Partitioner
> 1 core: 2.22 Gb/core
> 2 cores: about 2.7 Gb/core peak
> 3 cores: same as 2 cores
> 4 cores: about 2.6 Gb/core
>
> CentroidPartitioner
> 1 core: 2.22
> 2 cores: about 3 Gb/core peak
> 4 cores: about 2.8 Gb/core peak
>
> SFCPartitioner
> 1 core: 2.22
> 2 cores: slightly > 3 Gb/core peak
> 4 cores: almost exactly the same Gb/core as 2 cores case
>
> Using the Activity Monitor does not provide a huge amount of accuracy, but
> I think the trends are about the same for the Linear, Centroid, and SFC
> partitioners, and make a lot more sense than the Metis results. In
> particular, I was able to run on 4 cores without going into swap.
>
I just checked out the hash immediately prior to the latest Metis/Parmetis
refresh (git co 5771c42933), ran the same tests again, and got basically
the same results on the 200^3 case.
So I don't think the metis/parmetis refresh introduced any new memory
bugs...
Just for the hell of it, I also tried some other problem sizes, and in
going from 1 core to 2 cores (Metis off to Metis on) the memory usage per
core always increases (to within the accuracy of Activity Monitor) by a
factor between 1.5 and 1.9:
100^3: 300 -> 500 Mb/core (1.67X)
150^3: 975 ->1700 Mb/core (1.75X)
175^3: 1.5 -> 2.8 Gb/core (1.87X)
200^3: 2.22 -> 4 Gb/core (1.80X)
225^3: 3.15 -> 4.75 Gb/core (1.5X)
So I guess it's possible that Metis has always been like this, but we just
haven't noticed it because we don't run problems this big (with SerialMesh)
very often?
Also, the memory usage does go back down after the partitioning step is
complete, so as long as you can survive the memory spike, you can still run
an actual problem...
We have a more fine-grained memory checker tool here that I'm going to try
in a bit, and I'm also going to try the same tests with
ParallelMesh/Parmetis.
Ben, it looks like we currently base our partitioning algorithm choice
solely on the number of partitions... Do you recall if PartGraphKway is
any more memory efficient than the PartGraphRecursive algorithm? If so,
perhaps we could base our algorithm choice on the size of the mesh
requested as well as the number of partitions... I might experiment with
this a bit as well.
--
John
------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel