Hi Boyan,

Thank you for reporting this. I’ll definitely look into it, but wanted to get a 
quick reply before doing so.

You should not use the MPIPOSIX driver (unless there is a very compelling 
reason to do so)  for parallel I/O and should be using the MPIO driver as you 
would get much better performance. I think you are just reporting the make 
check failure you were getting here, but just wanted to make sure that you use 
the better option.

Did you see this issue only when using intel compilers? i.e. have you tried 
other compilers and saw the same error?

You are correct about the bug in t_dset.c with comm_size needing to be 
comm_rank. Fortunately neither the size and rank are used in this function, so 
I will just take them out for the next release.

Thanks,
Mohamad

From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Boyan Bejanov
Sent: Thursday, January 02, 2014 6:11 PM
To: [email protected]
Cc: Boyan Bejanov
Subject: [Hdf-forum] Problem with MPIPOSIX driver

Hello,

I am trying to build parallel hdf5 1.8.12 on RHEL6.4 using the Intel compilers 
(C icc and Fortran ifort)  and the IntelMPI 4.1.0 (which are included in the 
Intel Cluster Studio XE 2013).  Everything configured and compiled okay, 
although I am getting failures when running “make check”, specifically in 
“testpar/testphdf5”.  The error message is the following:

...
Testing  -- test cause for broken collective io (nocolcause)
Testing  -- test cause for broken collective io (nocolcause)
Testing  -- test cause for broken collective io (nocolcause)
Testing  -- test cause for broken collective io (nocolcause)
Testing  -- test cause for broken collective io (nocolcause)
Testing  -- test cause for broken collective io (nocolcause)
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(949): MPI_Barrier(comm=0x0) failed
PMPI_Barrier(903): Invalid communicator
...

I have managed to narrow it down – this error is generated when the MPIPOSIX 
driver is used (I think it’s in the call to H5Pset_fapl_mpiposix() itself).  
The “nocolcause” test is in the function “no_collective_cause_tests” in file 
“testpar/t_dest.c”.  When I comment out the two lines in this function that set 
the “TEST_SET_MPIPOSIX” flag, thus skipping the MPIPOSIX test, everything else 
checks successfully.

I am very new to HDF5, and to parallel io at all, so any help with this issue 
will be much appreciated.
Thanks,
Boyan

PS: While poking about it, I found an inconsequential bug in testpar/t_dest.c 
on line 3681 (in the same function): the line is
     MPI_Comm_size(MPI_COMM_WORLD, &mpi_rank);
while it should be
     MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);



====================================================================================

La version française suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so.

------------------------------------------------------------------------------------

Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to