On 2014-01-15 22:36, Mikael Mortensen wrote:
I have had a few testruns and compared my memory usage to a
Navier-Stokes solver from OpenFOAM. Some results thus far for a box of
size 40**3 are (using /proc/pid/statm to measure memory usage, see
https://github.com/mikaem/fenicstools/blob/master/fem/getMemoryUsage.cpp
[2])
1 CPU
OpenFOAM: 324 MB RSS (443 MB VM)
Fenics: 442 MB RSS (mesh 16 MB) + 60 MB for importing dolfin (1.0 GB
VM)
2 CPUs
OpenFOAM: 447 MB (1.6 GB VM)
Fenics: 520 MB (mesh 200 MB) + 120 MB for importing dolfin (1.7 GB VM)
4 CPUs
OpenFOAM: 540 MB RSS (2.0 GB VM)
Fenics: 624 MB (mesh 215 MB) + 250 MB for importing dolfin (3.0 GB VM)
Looks pretty good.
This is after Garth's removal of the D--0--D connectivity and I'm not
attempting to clean out topology or anything after the mesh has been
created. I'm actually very happy with these results:-) More CPUs
require more memory, but it increases more for OpenFOAM than for
Fenics and the difference is not all that much. Importing dolfin is a
considerable part of the total cost, but the cost does not increase
with mesh size so it's not of much concern (also, this depends on how
many dependencies you carry around). I added virtual memory in
parenthesis, but I'm not sure what to make of it.
Some other things I have learned (very) recently:
1) Clearing out memory during runtime is apparently not straight
forward because, in an effort to optimize speed (allocation is slow),
the system may choose to keep you previously allocated memory close
(in RAM). Even with the swap trick the operating system may actually
choose to keep the allocated memory available and not free it (see,
e.g.,
http://stackoverflow.com/questions/6526983/free-memory-from-a-long-vector
[3]). Apparently there is a compiler option GLIBCXX_FORCE_NEW that is
intended to disable this caching of memory, but I have not been able
to make it work. With the swap trick memory is definitely freed quite
efficiently, but it is frustrating trying to figure out where memory
goes when we are not in total control of allocation and deallocation.
One example - I use project to initialize a Function. During project
rss memory increases substantially due to, e.g., a matrix being
created behind the scenes. One would expect that this memory is freed
after finishing, but that does not always happen. The additional
memory may be kept in RAM and used later. If, for example, I create a
matrix just after project, then memory use may actually not increase
at all, because the OS can just reuse the memory it did not free. I
guess this is something one has to live with using dynamic memory
allocation? I really know very little of how these things works, so it
would be great if someone who does could comment?
I wouldn't worry about this. It can be hard to measure memory via the OS
reporting tools because of the way the OS manages memory, as you
describe above. Just leave it to the OS. You could try KCachegrind to
measure and visualise where memory is being used in DOLFIN.
2) Both MeshGeometry.clear() and MeshTopology.clear() should probably
make use of the swap trick instead of std::vector<>.clear(). At least
then the memory ´may´ be freed when clearing the mesh.
We could bite the bullet and start using C++11. It has a shrink_to_fit
function:
http://en.cppreference.com/w/cpp/container/vector/shrink_to_fit
This would be useful when we clear a container and when we know that no
more entries will be added.
3) Memory use for creating a parallel UnitCubeMesh(40, 40, 40) goes
like this:
16 MB in creating the serial mesh
56 MB in creating LocalMeshData class
41 MB in creating cell partition in
MeshPartitioning::build_distributed_mesh
101 MB in creating mesh from LocalMeshData and cell partitioning in
MeshPartitioning::build
No memory is freed after the mesh has been created, even though it
looks to me like LocalMeshData runs out of scope?
LocalMeshData should be destroyed. Take a look with KCachegrind. There
is also gperftools:
https://code.google.com/p/gperftools/
for memory profiling.
Mikael
PS Not really relevant to this test, but my NS-solver is 20% faster
than OpenFOAM for this channel-flow test-case:-)
Nice. On the same mesh? Does your solver have an extra order of accuracy
over OpenFOAM?
Garth
Den Jan 15, 2014 kl. 10:48 AM skrev Garth N. Wells:
On 2014-01-15 07:13, Anders Logg wrote:
On Tue, Jan 14, 2014 at 08:08:50PM +0000, Garth N. Wells wrote:
On 2014-01-14 19:28, Anders Logg wrote:
On Tue, Jan 14, 2014 at 05:19:55PM +0000, Garth N. Wells wrote:
On 2014-01-14 16:24, Anders Logg wrote:
On Mon, Jan 13, 2014 at 07:16:01PM +0000, Garth N. Wells wrote:
On 2014-01-13 18:42, Anders Logg wrote:
On Mon, Jan 13, 2014 at 12:45:11PM +0000, Garth N. Wells
wrote:
I've just pushed some changes to master, which for
from dolfin import *
mesh = UnitCubeMesh(128, 128, 128)
mesh.init(2)
mesh.init(2, 3)
g>
give a factor 2 reduction in memory usage and a factor 2
speedup.
Change is at
https://bitbucket.org/fenics-project/dolfin/commits/8265008
[1].
The improvement is primarily due to d--d connectivity no
being
computed. I'll submit a pull request to throw an error when
d--d
connectivity is requested. The only remaining place where
d--d is
(implicitly) computed is in the code for finding constrained
mesh
entities (e.g., for periodic bcs). The code in question is
for (MeshEntityIterator(facet, dim) e; . . . .)
when dim is the same as the topological dimension if the
facet. As
in other places, it would be consistent (and the original
intention
of the programmer) if this just iterated over the facet
itself
rather than all facets attached to it via a vertex.
I don't see why an error message is needed. Could we not just
add the
possibility to specify what d -- d means? It might be useful
for other
algorithms to be able to compute that data.
I think it should be removed because it's (i) ad-hoc, (ii) is
not
used/required in the library and (iii) is a memory monster.
Moreover, we have dimension-independent algorithms that work
when
d--d connectivity is a connection from an entity to itself
(for
which we don't need computation and memory eating). We
shouldn't
have an unnecessary, memory monster d-0-d data structure being
created opaquely for no purpose, which is what what
for (MeshEntityIterator e(entity, dim); . . . .){....}
does at present when (dimension of entity) = dim. The
excessive
memory usage is an issue for big problems.
If a user wants d-0-d it can be built explicitly, which makes
both
the purpose and the intention clear.
How can it be built explicitly without calling mesh.init(d, d)?
Build d--0, then 0--d:
for (MeshEntityIterator e0(mesh, d); .....)
for (VertexIterator v(*e0); ....)
for (MeshEntityIterator e2(*v, d); .....)
Yes, that's even more explicity but since we already have a
function
that computes exactly that, I don't see the urge to remove it.
Note that d--0--d is no longer used in DOLFIN (except by
accident in
finding periodic bcs because of inconsistent behind-the-scenes
behaviour, and it should be changed because it's expensive and
not
required for periodic bcs).
Not true - see line 67 of Extrapolation.cpp.
That code is not covered by any test or demo.
Yes it is, by a unit test under test/unit/adaptivity/ and all the
auto-adaptive demos, including the documented demo
auto-adaptive-poisson.
OK. I don't know why it didn't trigger a failure before when I added
an error when d0==d1 and ran the tests.
Having easy access to
neighboring cells (however one chooses to define what it means to
be a
cell neighbor of a cell) can be useful to many algorithms that
perform
local search algorithms or solve local problems.
Sure, but again the user should decide how a connection is
defined.
Yes, I agree.
Again, I don't see the urge to remove this function just because
it
consumes a lot of memory. The important point is to avoid using
it for
standard algorithms in DOLFIN, like boundary conditions.
That's not a good argument. If poor/slow/memory hogging code is
left
in the library, it eventually gets used in the library despite any
warnings. It's happened time and time again.
I have never understood this urge to remove all functionality that
can
potentially be misused. We are adults so it should be enough to
write
a warning in the documentation.
This has proven not to be enough in the past. Moreover, it is good to
make it easy for users to write good code by not providing functions
to are easily misused, and it's good to make it easy to write code
that doesn't only work well for small serial cases.
Garth
The proper
way to fix functions that should not use this particular function
(in
for example periodic bcs) would be to create an issue for its
misuse.
The periodic bc code does not mis-use the function. The flaw is
that
the mesh library magically and unnecessarily creates a memory
monster behind the scenes.
I agree with that.
--
Anders
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics
Links:
------
[1] https://bitbucket.org/fenics-project/dolfin/commits/8265008
[2]
https://github.com/mikaem/fenicstools/blob/master/fem/getMemoryUsage.cpp
[3]
http://stackoverflow.com/questions/6526983/free-memory-from-a-long-vector
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics