Rolf,
There are several things that doesn't make sense in this patch,
several design approaches that are unclear and several coding
standards that have not been respected. I didn't test the changes for
correctness, but a quick reading through the svn diff raised some
warnings.
Here is a short list:
- way too many "#if 0"
- The error reporting function get two new arguments, which are used
in only one instance. Moreover, when they are used (and this is in the
openib BTL) they are returning something the openib BTL is not
supposed to know ... another endpoint not necessarily related to
openib. I wonder why one expects the openIB BTL to move descriptors/
fragments from one module to another? There are two problems with
this: first it only works when there are multiple openib BTLs, i.e. no
failover over another BTL; second, if there was another openib
endpoint ... the openib component is already supposed to know about
it, there is no reason to get it back from the PML.
- opal_output_verbose on the critical path in several places.
- a new mca_bml_base_output barely used, and only in the critical path.
- the new field btl_ifname in the BTL module is mostly used in the
openib BTL. There is an exception, in some of the opal_output_verbose
on the critical path I was talking previously.
- ompi_proc_all return a copy of the whole list of known processes.
This list is never freed, so there is a big memory leak.
george.
On Aug 5, 2009, at 17:53 , ro...@osl.iu.edu wrote:
Author: rolfv
Date: 2009-08-05 17:53:02 EDT (Wed, 05 Aug 2009)
New Revision: 21767
URL: https://svn.open-mpi.org/trac/ompi/changeset/21767
Log:
HCA failover support in openib BTL
Text files modified:
trunk/ompi/mca/bml/base/bml_base_open.c | 10
trunk/ompi/mca/bml/bml.h | 5
trunk/ompi/mca/bml/r2/bml_r2.c | 11
trunk/ompi/mca/btl/btl.h | 6
trunk/ompi/mca/btl/elan/btl_elan.c | 1
trunk/ompi/mca/btl/gm/btl_gm.c | 1
trunk/ompi/mca/btl/mx/btl_mx.c | 1
trunk/ompi/mca/btl/ofud/btl_ofud.c | 1
trunk/ompi/mca/btl/openib/btl_openib.c | 8
trunk/ompi/mca/btl/openib/btl_openib.h | 3
trunk/ompi/mca/btl/openib/btl_openib_component.c | 404 ++++++++++
+++++++++++++++++++++++++++++
trunk/ompi/mca/btl/openib/btl_openib_endpoint.c | 2
trunk/ompi/mca/btl/openib/btl_openib_endpoint.h | 57 +++++
trunk/ompi/mca/btl/openib/btl_openib_frag.h | 1
trunk/ompi/mca/btl/openib/btl_openib_mca.c | 5
trunk/ompi/mca/btl/pcie/btl_pcie.c | 1
trunk/ompi/mca/btl/portals/btl_portals.c | 1
trunk/ompi/mca/btl/sctp/btl_sctp.c | 1
trunk/ompi/mca/btl/self/btl_self.c | 1
trunk/ompi/mca/btl/sm/btl_sm.c | 1
trunk/ompi/mca/btl/tcp/btl_tcp.c | 1
trunk/ompi/mca/btl/template/btl_template.c | 1
trunk/ompi/mca/btl/udapl/btl_udapl.c | 1
trunk/ompi/mca/pml/ob1/pml_ob1.c | 72 ++++++
trunk/ompi/mca/pml/ob1/pml_ob1.h | 1
trunk/ompi/mca/pml/ob1/pml_ob1_component.c | 6
trunk/ompi/mca/pml/ob1/pml_ob1_recvfrag.c | 15 +
trunk/ompi/mca/pml/ob1/pml_ob1_recvreq.c | 1
trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.c | 3
29 files changed, 602 insertions(+), 20 deletions(-)