On Fri, 22 Oct 2010, John Peterson wrote:

On Fri, Oct 22, 2010 at 2:46 AM, Tim Kroeger
<[email protected]> wrote:
On Thu, 21 Oct 2010, Roy Stogner wrote:

It looks like there's a deadlock in here in parallel?  You're doing
blocking sends followed by blocking receives?  Probably need to switch
those (at least the sends, but optimally both) to non-blocking.

Good point.  It worked for me, but aparently I've just been lucky.

I didn't find any place in the library where non-blocking communication is
used so far, so could you please double-check that I did it right now?

It's difficult to search for them since we've named them both "send"
in our C++ MPI interface, but there is an example of a non-blocking
send in mesh_communication.C, line 333.

I see, I was searching for "nonblocking_send", which is also defined in the library but obviously not used. I didn't observe that there is also a non-blocking variant that is just named "send".

(BTW: Sorry for the delay; a lot of different work popped up in the meantime...)

I don't think that non-blocking receives would gain much performance here.

One paradigm is to post all the non-blocking receives before doing all
of the non-blocking sends.  I guess the idea that all the processors
are then "ready" to receive their message(s), no matter what order the
non-blocking sends are actually completed in.

Good idea, but I see that this isn't done at the point that you mentioned either, is it? I guess, it's not easy to do this if you're not knowing in advance how much data you're going to receive.

Also, in the code that you mentioned, another thing happens which I think is dangerous: That is, you are having a

        std::vector<Parallel::Request> node_send_requests;

(and a number of similar vectors), and then, each time you want to send something, you do

        node_send_requests.push_back(Parallel::request());
        Parallel::send(...,node_send_requests.back(),...);

While this looks perfectly alright, I noticed that there is a subtle problem with this (because I tried to do it the same way in my code and found it not working and traced it back), that is: push_back() may possibly have to re-allocate memory and copy all elements. In that case, for all previous requests, the copy constructor will be called, and then the destructor. But the destructor calls MPI_Request_free(), which makes the copy become invalid.

The easiest way I can think of to avoid that problem is to fill the vector with empty requests *before* you actually send something and then leave the vector length fixed. More sophisticated (but also cleaner) solutions to this problem are thinkable, though.

Let me know if I missed some reason why this problem should be unable to occur.

In practice, it
probably doesn't make much difference, but it's always nice to avoid
potential deadlocks by using non-blocking versions of the
communication routines.

Yes, but non-blocking sends should be enough to avoid deadlocks. I'll leave my code unchanged then (with non-blocking sends and blocking receives) until I should encounter that this is a bottle-neck. However, if somebody else wants to improve this, I wouldn't object of course (once it has been checked in).

Best Regards,

Tim

--
Dr. Tim Kroeger
CeVis -- Center of Complex Systems and Visualization
University of Bremen              [email protected]
Universitaetsallee 29             [email protected]
D-28359 Bremen                             Phone +49-421-218-7710
Germany                                    Fax   +49-421-218-4236
------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware, 
phishing sites, and compromised hosts - saving your company time, 
money, and embarrassment.   Learn More! 
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to