Apologies for the late reply; I've been sick this week, and although
I'm mostly better my head's still not 100% back in the game.
My slightly-untrustworthy thoughts:
Yes, this is definitely a serious urgent issue. It's the kind of bug
that can avoid being triggered by some STL/MPI implementations (like
our regression tests...) but can be fatal in others.
As best as I can tell from the MPI 1.1 standard docs, requests will be
deallocated by a completed MPI_Wait even without MPI_Request_free ever
being called. So as an immediate fix I'd just remove the
MPI_Request_free call - that'll fix library code and at worst it'll
replace a potential error with a trivial memory leak in user code.
std::list isn't a fix - even if it was guaranteed to work for library
code we can't force the same fix in user code. And we shouldn't -
even the C++0x people seem to have realized that the STL+auto_ptr
incompatibility was a big mistake.
Long-term, there are two policy options that I think make sense:
1. Don't ever do MPI_Request_free implicitly from a Request object.
2. Do reference counting in Request objects; do an MPI_Request_free
when the last reference is destroyed.
I can see pros and cons to each; I'd appreciate others' opinions.
---
Roy
On Wed, 3 Nov 2010, John Peterson wrote:
On Wed, Nov 3, 2010 at 12:32 PM, Kirk, Benjamin (JSC-EG311)
<[email protected]> wrote:
<[email protected]> wrote:
On Fri, 22 Oct 2010, John Peterson wrote:
Also, in the code that you mentioned, another thing happens which I think is
dangerous: That is, you are having a
std::vector<Parallel::Request> node_send_requests;
(and a number of similar vectors), and then, each time you want to send
something, you do
node_send_requests.push_back(Parallel::request());
Parallel::send(...,node_send_requests.back(),...);
While this looks perfectly alright, I noticed that there is a subtle problem
with this (because I tried to do it the same way in my code and found it not
working and traced it back), that is: push_back() may possibly have to
re-allocate memory and copy all elements. In that case, for all previous
requests, the copy constructor will be called, and then the destructor. But
the destructor calls MPI_Request_free(), which makes the copy become
invalid.
Wow, yeah... this is a serious issue we should resolve ASAP. It's
like having an auto_ptr in an STL container ... a big C++ no-no.
What do you think Roy?
We could change the Request class's destructor so that it does not
attempt to manage the lifetime... as I understand it, it's not
necessary to call MPI_Request_free unless you want to actually cancel
an Isend/Irecv operation. So for that we could have an explicit
Request::free() interface. Otherwise I don't think we leak memory
since there is no malloc/new in the first place.
Hmm... That definitely sounds like the right approach to me. But if for
some reason it doesn't work we could use a std::list<Parallel::Request>
instead.
I'm pretty sure there is a blanket ban on non-copyable, non-assignable
objects in *all* STL containers.
The Parallel::Request object isn't (currently) truly copyable, since
you can't destroy the original without affecting the copy.
Although std::list itself might never actually need to make a Request
copy, I wouldn't want to take the chance...
--
John
------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment. Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users
------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a
Billion" shares his insights and actions to help propel your
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users