Hello all

There has been some recent activity aimed at reducing memory "leaks" from
within the Open MPI code base, including OpenRTE. These are most welcome and
long overdue. It has, though, caused a couple of questions to me about why
we used malloc so extensively within OpenRTE. Rather than answer these
independently, I though it might help if I documented this history for
future participants.

The decision to use dynamic, as opposed to static, memory allocation as our
"standard" method within OpenRTE was made at an ORTE design meeting
approximately two years ago. The overarching reasons for that decision were
four-fold:

1. we didn't want to introduce any system-level constraints on sizes for
things like arrays or strings;

2. given the large degree of flexibility in the system, only a small
percentage of all code paths might be exercised in any given job. Static
memory allocation would therefore have caused the overall memory footprint
for OpenRTE to include storage for data that would likely never be used -
whereas dynamic allocation would ensure we only consumed as much memory as
required for that particular code path;

3. tracking down memory corruption is generally much more difficult than
plugging memory leaks. Given (1), we either would have to continually check
the size of data being given to us to ensure we weren't overrunning static
allocations, or we would have to spend considerable time and effort tracking
down memory corruption problems. We felt that it would be more time
efficient (from a development standpoint) to avoid these problems and just
malloc the memory - and then use valgrind and other tools to eventually plug
any resulting leaks. Every now and then we do make a pass at reducing the
worst of the leakage, but no really concerted effort has been made to-date
as it just hasn't been enough of a problem to merit a high priority; and

4. the performance impact of using malloc was considered inconsequential to
the OpenRTE functional requirements. Current measurements show that the
total time to traverse a launch procedure is a few milliseconds (not
including the time to send xcast stage gate messages - see my other note on
scalability issues). This is well within any functional requirement
expressed to date, so obviously the use of malloc hasn't created a major
problem in that regard.

As a result, the code contains a number of malloc/free combinations that
typically involve small quantities of memory. Many of these are in debugging
code that only gets called if/when a developer requests it be run - I
generally ignore these as that is not a code path that ever gets exercised
during normal operations. Of the remainder, please feel free to plug leaks
and/or consider alternatives. Just please keep in mind the prior
considerations when making your changes.

Hope that helps.
Ralph


Reply via email to