Could you manually use the space filling curve partitioner (hard code?) to see if that is better behaved?
On Oct 29, 2013, at 12:19 PM, "John Peterson" <jwpeter...@gmail.com> wrote: > On Tue, Oct 29, 2013 at 9:32 AM, Cody Permann <codyperm...@gmail.com> wrote: > >> >> On Tue, Oct 29, 2013 at 5:54 AM, ernestol <ernes...@lncc.br> wrote: >> >>> I am using an cluster with 23 node for a total of 184 cores, and each >> node >>> additionally has 16GB of RAM. I was thinking that the problem maybe is in >>> the code. Because if I run at up to 3 processors I dont have any problens >>> but when I try with 4 or more I get this problem. > > So you have 8 cores per node, and 2 GB of RAM per core, which is pretty > standard. > > I ran your 200^3 code on my Mac workstation and watched the memory usage in > Activity Monitor. > > The results were somewhat surprising as I added cores: > > 1 core: 2.22 Gb/core > 2 cores: 4.0 Gb/core > 3 cores: slightly more than 4.0 Gb/core > 4 cores: machine went into swap (I think) after approaching about 3.5 > Gb/core but code eventually finished > 5 cores: machine again went into swap at around 3.3 Gb/core but finished > eventually > > My workstation has 20 Gb of RAM, so including the OS I guess I could see > how approaching 16Gb might cause it to go into swap. > > But, what is happening when we go from 1 to 2 cores that causes the memory > usage per core to double?! > > Note that in all cases the memory quickly jumps to about 2.22 Gb core. In > the 1 processor case it stays there, but in the 2-5 processor cases, after > reaching 2Gb/core, it slowly ramps up to the approximately 4 Gb/core listed > above. > > This, combined with the error message you received (which comes from Metis) > leads me to believe that the partitioner is taking up a ton of memory > (partitioner doesn't run on 1 proc). So the questions become: > > 1.) Is the partitioner taking up a lot more memory than it conceivably > should? (Seems like yes.) > 2.) Is it taking up more than it used to? I.e., has a bug been introduced > recently (Metis and Parmetis were last updated in April 2013, so pretty > recently actually) > > I don't know about reverting to a prior version of Metis/Parmetis is easily > done at this point, but the relevant hashes where the refresh happened seem > to be: > > e80824e86a > 1c4b6a0d12 > > I may take a stab at this after lunch... Cody has been seeing similar > issues recently as well. > > -- > John > ------------------------------------------------------------------------------ > Android is increasing in popularity, but the open development platform that > developers love is also attractive to malware creators. Download this white > paper to learn more about secure code signing practices that can help keep > Android apps secure. > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > _______________________________________________ > Libmesh-users mailing list > libmesh-us...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/libmesh-users ------------------------------------------------------------------------------ Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel