Bill --

Check out http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork.

To my knowledge, RHEL4 has not yet received a hotfix that will allow fork() with OpenFabrics verbs applications when memory is still registered in the parent.


On Aug 6, 2007, at 7:53 AM, Bill Wichser wrote:

We have run across an issue, probably more related to openib than to openmpi but don't know how to resolve.

Linux kernel - 2.6.9-55.0.2.ELsmp x86_64
libibverbs-1.0.4-7

openmpi - it doesn't matter - 1.1.5 and 1.2.3 both fail.

When the sample code is run across IB nodes, using the IB interface, the receive just hangs whenever a system call is issued. Removing this system call removes the hang. Running across the nodes over TCP removes the hang. Running on a single node removes the hang. Only when using the IB interface do we have this hang.

So the simple solution is "don't do this" but apparently something deeper is involved and who knows where it will pop up again.

Thanks,
Bill

ps - sample code compiled using mpicc, built with gcc. You'll need a test.dat file for the system("cp") command.
#include <stdio.h>
#include <mpi.h>
#include <unistd.h>

char All[4840];
int ThisTask;
int NTask;

int main(int argc, char **argv)
{
  int task;
  int nothing;
  MPI_Status status;

  int errorFlag = 0;
  int sysstatus;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &ThisTask);
  MPI_Comm_size(MPI_COMM_WORLD, &NTask);
#if 1
  if(ThisTask == 0) {
      printf("Task %d cmd run\n", ThisTask);
      sysstatus = system(
        "cp test.dat test2.dat");
      printf("Task %d cmd status %d\n", ThisTask, sysstatus);
  }
#else
  if (ThisTask == 0) {
     sleep(60);
  }
#endif

  if (ThisTask == 0) {
    printf("Task 0 Wait Loop START\n");
    for (task = 1; task < NTask; task++) {
       printf("Task %d Recv START\n", task);
MPI_Recv(&nothing, sizeof(nothing), MPI_BYTE, task, 0, MPI_COMM_WORLD,
                &status);
       printf("Task %d Recv END\n", task);
    }
    printf("Task 0 Wait Loop END\n");
  }
  else {
    printf("Task %d Send START\n", ThisTask);
MPI_Send(&nothing, sizeof(nothing), MPI_BYTE, 0, 0, MPI_COMM_WORLD);
    printf("Task %d Send END\n", ThisTask);
  }

  printf("Task %d Finalize START\n", ThisTask);
  MPI_Finalize();               /* clean up & finalize MPI */
  printf("Task %d Finalize END\n", ThisTask);

  return 0;
}
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to