Apologies for the late reply; I've been sick this week, and although
I'm mostly better my head's still not 100% back in the game.

My slightly-untrustworthy thoughts:

Yes, this is definitely a serious urgent issue.  It's the kind of bug
that can avoid being triggered by some STL/MPI implementations (like
our regression tests...) but can be fatal in others.

As best as I can tell from the MPI 1.1 standard docs, requests will be
deallocated by a completed MPI_Wait even without MPI_Request_free ever
being called.  So as an immediate fix I'd just remove the
MPI_Request_free call - that'll fix library code and at worst it'll
replace a potential error with a trivial memory leak in user code.

std::list isn't a fix - even if it was guaranteed to work for library
code we can't force the same fix in user code.  And we shouldn't -
even the C++0x people seem to have realized that the STL+auto_ptr
incompatibility was a big mistake.

Long-term, there are two policy options that I think make sense:

1. Don't ever do MPI_Request_free implicitly from a Request object.

2. Do reference counting in Request objects; do an MPI_Request_free
when the last reference is destroyed.

I can see pros and cons to each; I'd appreciate others' opinions.
---
Roy

On Wed, 3 Nov 2010, John Peterson wrote:

On Wed, Nov 3, 2010 at 12:32 PM, Kirk, Benjamin (JSC-EG311)
<[email protected]> wrote:
<[email protected]> wrote:
On Fri, 22 Oct 2010, John Peterson wrote:

Also, in the code that you mentioned, another thing happens which I think is
dangerous: That is, you are having a

       std::vector<Parallel::Request> node_send_requests;

(and a number of similar vectors), and then, each time you want to send
something, you do

       node_send_requests.push_back(Parallel::request());
       Parallel::send(...,node_send_requests.back(),...);

While this looks perfectly alright, I noticed that there is a subtle problem
with this (because I tried to do it the same way in my code and found it not
working and traced it back), that is: push_back() may possibly have to
re-allocate memory and copy all elements.  In that case, for all previous
requests, the copy constructor will be called, and then the destructor.  But
the destructor calls MPI_Request_free(), which makes the copy become
invalid.

Wow, yeah... this is a serious issue we should resolve ASAP.  It's
like having an auto_ptr in an STL container ... a big C++ no-no.

What do you think Roy?

We could change the Request class's destructor so that it does not
attempt to manage the lifetime... as I understand it, it's not
necessary to call MPI_Request_free unless you want to actually cancel
an Isend/Irecv operation.  So for that we could have an explicit
Request::free() interface.  Otherwise I don't think we leak memory
since there is no malloc/new in the first place.

Hmm...  That definitely sounds like the right approach to me. But if for
some reason it doesn't work we could use a std::list<Parallel::Request>
instead.

I'm pretty sure there is a blanket ban on non-copyable, non-assignable
objects in *all* STL containers.

The Parallel::Request object isn't (currently) truly copyable, since
you can't destroy the original without affecting the copy.

Although std::list itself might never actually need to make a Request
copy, I wouldn't want to take the chance...

--
John

------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users
------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to