> On Nov 2, 2013, at 7:01 AM, "Kirk, Benjamin (JSC-EG311)" > <benjamin.k...@nasa.gov> wrote: > >> On Nov 2, 2013, at 7:42 AM, "Kirk, Benjamin (JSC-EG311)" >> <benjamin.k...@nasa.gov> wrote: >> >>> On Nov 1, 2013, at 11:15 PM, "John Peterson" <jwpeter...@gmail.com> wrote: >>> >>> Finally, note that we call adjncy.reserve(graph_size); where >>> graph_size=48M. This was most likely done for speed, but it means that the >>> full memory for both 'graph' and 'adjncy' are required at the same time. >>> We could instead try letting the size of adjncy grow as it's filled in, and >>> check to see if it slows down the code appreciably... but I would expect >>> this to be a much smaller memory savings than somehow refactoring 'graph' >>> would be. >> >> Yeah, and its actually gotta be worse than that - we use push_back to build >> the graph rows, so for the case of 6 neighbors vector doubling would have >> make row.capacity()=8. >> >> This is an excellent example of where the simple implementation works but >> has now become too simple. >> >> Building the adjacency graph directly should work, but will require some >> careful code. Also, depending on the order we process the elements, we could >> be inserting near the end of the 48M vector, or randomly in the middle. >> >> Probably too early to worry about that part, but if it is really slow in the >> 'natural' order it may be worth building the adjacency graph by looping over >> the elements in such a way we are only inserting into the end. >> >> (As an aside, I wonder if inserting into the middle or front of a deque is >> faster than a vector?) > > Ok, clearly there is ample room to reduce the memory footprint of our metis > prep, and some of that will carry over to other areas of the code too > (vectormap). But I guess one obvious thing is: why have all cores call metis > anyway? > > 12 years ago I guess I was in the one rank per machine mindset, so this > approach would have no memory contention, but now that's definitely not the > case. > > So in addition, I think pretty much everything after the > find_global_indices() call (that is parallel_only()) should be done on rank > 0, and the results broadcast!! Doh!
Damn... Yeah this would clearly help the most for the parallel runs. No reason to really optimize anything else until that's implemented! > (And I second Roy's plea to get this magnificent tool into contrib/utils) Ok, let me check with the authors... ------------------------------------------------------------------------------ Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel