Re: [Libmesh-users] Solve on part of domain

Tim Kroeger Wed, 03 Nov 2010 08:49:04 -0700

On Fri, 22 Oct 2010, John Peterson wrote:

On Fri, Oct 22, 2010 at 2:46 AM, Tim Kroeger
<[email protected]> wrote:

On Thu, 21 Oct 2010, Roy Stogner wrote:

It looks like there's a deadlock in here in parallel?  You're doing
blocking sends followed by blocking receives?  Probably need to switch
those (at least the sends, but optimally both) to non-blocking.


Good point.  It worked for me, but aparently I've just been lucky.

I didn't find any place in the library where non-blocking communication is
used so far, so could you please double-check that I did it right now?


It's difficult to search for them since we've named them both "send"
in our C++ MPI interface, but there is an example of a non-blocking
send in mesh_communication.C, line 333.

I see, I was searching for "nonblocking_send", which is also definedin the library but obviously not used. I didn't observe that there isalso a non-blocking variant that is just named "send".

(BTW: Sorry for the delay; a lot of different work popped up in themeantime...)

I don't think that non-blocking receives would gain much performance here.


One paradigm is to post all the non-blocking receives before doing all
of the non-blocking sends.  I guess the idea that all the processors
are then "ready" to receive their message(s), no matter what order the
non-blocking sends are actually completed in.

Good idea, but I see that this isn't done at the point that youmentioned either, is it? I guess, it's not easy to do this if you'renot knowing in advance how much data you're going to receive.

Also, in the code that you mentioned, another thing happens which Ithink is dangerous: That is, you are having a


        std::vector<Parallel::Request> node_send_requests;

(and a number of similar vectors), and then, each time you want tosend something, you do


        node_send_requests.push_back(Parallel::request());
        Parallel::send(...,node_send_requests.back(),...);

While this looks perfectly alright, I noticed that there is a subtleproblem with this (because I tried to do it the same way in my codeand found it not working and traced it back), that is: push_back() maypossibly have to re-allocate memory and copy all elements. In thatcase, for all previous requests, the copy constructor will be called,and then the destructor. But the destructor calls MPI_Request_free(),which makes the copy become invalid.

The easiest way I can think of to avoid that problem is to fill thevector with empty requests *before* you actually send something andthen leave the vector length fixed. More sophisticated (but alsocleaner) solutions to this problem are thinkable, though.

Let me know if I missed some reason why this problem should be unableto occur.

In practice, it
probably doesn't make much difference, but it's always nice to avoid
potential deadlocks by using non-blocking versions of the
communication routines.

Yes, but non-blocking sends should be enough to avoid deadlocks.I'll leave my code unchanged then (with non-blocking sends andblocking receives) until I should encounter that this is abottle-neck. However, if somebody else wants to improve this, Iwouldn't object of course (once it has been checked in).


Best Regards,

Tim

--
Dr. Tim Kroeger
CeVis -- Center of Complex Systems and Visualization
University of Bremen              [email protected]
Universitaetsallee 29             [email protected]
D-28359 Bremen                             Phone +49-421-218-7710
Germany                                    Fax   +49-421-218-4236

------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware, 
phishing sites, and compromised hosts - saving your company time, 
money, and embarrassment.   Learn More! 
http://p.sf.net/sfu/hpdev2dev-nov

_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Re: [Libmesh-users] Solve on part of domain

Reply via email to