On 04/13/09 09:40, George Bosilca wrote:

On Apr 12, 2009, at 21:58 , Timothy Hayes wrote:

I was wondering if someone might be able to shed some light on a couple of questions I have.

When you receive a fragment/base_descriptor in a BTL module, is the raw data allowed to be fragmented when you invoke the callback function? By that I mean, I'm using a circular buffer in each endpoint so sometimes data loops back around. Currently I'm doing a two step copy: from my socket to the circular buffer and then from the circular buffer to the fragment. This actually effects my total throughput quite a bit, it would be much nicer to just point to the buffer instead. When I tried using two base_segments to point to the start and end of buffer I got some pretty strange errors. I'm just wondering if someone could confirm or deny that you can or can't do this, maybe those errors were down to human error instead.

On the descriptor you can set a number of iovec containing the raw data. You don't have to make it contiguous prior to calling up in the PML. I think the PML header has to be contiguous, so you have to make sure that the first 32 bytes of the message are contiguous.

My other question is about the BTL failover system. Would someone be able to briefly explain how it works or maybe point me to some docs? I'm actually expecting the file descriptors in my module to fail a certain point during an Open MPI job and I'd like my BTL module to fail gracefully and allow the TCP module to take over in its place. I'm not sure how to explicitly make the the BTL module say to the rest of Open MPI "don't use my anymore" though.

There is no way to say don't use me "at all" anymore. This is per peer based, so you will have to return an error on every peer. Try returning OMPI_ERR_OUT_OF_RESOURCE from all functions that allocate descriptors (_alloc, _prepare_src and _prepare_dst), and the PML will end-up removing this BTL from the list.

  george.


We also looking at mapping out a BTL when we get an error. We are going down the path of looking at registering a PML OB1 callback function that gets invoked when we get an error in the BTL. Then this PML OB1 callback function can map out the BTL via a call to mca_bml.bml_del_btl(btl) which seems to be doing the right thing.

But, to make this all work requires changes to the PML OB1 layer.

We are also figuring out what we do for retransmission when we get an error.

Rolf

Reply via email to