HI Eric,
Does your app also work with MPICH? The romio in Open MPI is getting a
bit old, so it would be useful to know if you see the same valgrind
error using a recent MPICH.
Howard
2014-12-19 9:50 GMT-07:00 Eric Chamberland
<eric.chamberl...@giref.ulaval.ca
<mailto:eric.chamberl...@giref.ulaval.ca>>:
Hi,
I encountered a new bug while testing our collective MPI I/O
functionnalities over NFS. This is not a big issue for us, but I
think someone should have a look at it.
While running at 3 processes, we have this error on rank #0 and rank
#2, knowing that rank #1 have nothing to write (0 length size) on
this particular PMPI_File_write_all_begin call:
==19211== Syscall param write(buf) points to uninitialised byte(s)
==19211== at 0x10CB739D: ??? (in /lib64/libpthread-2.17.so
<http://libpthread-2.17.so>)
==19211== by 0x27438431: ADIOI_NFS_WriteStrided (ad_nfs_write.c:645)
==19211== by 0x27451963: ADIOI_GEN_WriteStridedColl
(ad_write_coll.c:159)
==19211== by 0x274321BD: MPIOI_File_write_all_begin
(write_allb.c:114)
==19211== by 0x27431DBF:
mca_io_romio_dist_MPI_File___write_all_begin (write_allb.c:44)
==19211== by 0x2742A367: mca_io_romio_file_write_all___begin
(io_romio_file_write.c:264)
==19211== by 0x12126520: PMPI_File_write_all_begin
(pfile_write_all_begin.c:74)
==19211== by 0x4D7CFB: SYEnveloppeMessage<std::__string>
PAIO::__ecritureIndexeParBlocMPI<__PAIOType<double>,
PtrPorteurConst<Arete, Arete>,
FunctorCopieInfosSurDansVectPA__Type<PAIOType<double>,
std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double,
Arete>*> > const>,
FunctorAccesseurPorteurLocal<__PtrPorteurConst<Arete, Arete> >
>(PAGroupeProcessus&, ompi_file_t*, long long,
PtrPorteurConst<Arete, Arete>, PtrPorteurConst<Arete, Arete>,
FunctorCopieInfosSurDansVectPA__Type<PAIOType<double>,
std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double,
Arete>*> > const>&,
FunctorAccesseurPorteurLocal<__PtrPorteurConst<Arete, Arete> >&,
long, DistributionComposantes&, long, unsigned long, unsigned long,
std::string const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4E9A67:
GISLectureEcriture<double>::__visiteMaillage(Maillage const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4C79A2:
GISLectureEcriture<double>::__ecritGISMPI(std::string,
GroupeInfoSur<double> const&, std::string const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4961AD: main (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== Address 0x295af060 is 144 bytes inside a block of size
524,288 alloc'd
==19211== at 0x4C2C27B: malloc (in
/usr/lib64/valgrind/vgpreload___memcheck-amd64-linux.so)
==19211== by 0x2745E78E: ADIOI_Malloc_fn (malloc.c:50)
==19211== by 0x2743757C: ADIOI_NFS_WriteStrided (ad_nfs_write.c:497)
==19211== by 0x27451963: ADIOI_GEN_WriteStridedColl
(ad_write_coll.c:159)
==19211== by 0x274321BD: MPIOI_File_write_all_begin
(write_allb.c:114)
==19211== by 0x27431DBF:
mca_io_romio_dist_MPI_File___write_all_begin (write_allb.c:44)
==19211== by 0x2742A367: mca_io_romio_file_write_all___begin
(io_romio_file_write.c:264)
==19211== by 0x12126520: PMPI_File_write_all_begin
(pfile_write_all_begin.c:74)
==19211== by 0x4D7CFB: SYEnveloppeMessage<std::__string>
PAIO::__ecritureIndexeParBlocMPI<__PAIOType<double>,
PtrPorteurConst<Arete, Arete>,
FunctorCopieInfosSurDansVectPA__Type<PAIOType<double>,
std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double,
Arete>*> > const>,
FunctorAccesseurPorteurLocal<__PtrPorteurConst<Arete, Arete> >
>(PAGroupeProcessus&, ompi_file_t*, long long,
PtrPorteurConst<Arete, Arete>, PtrPorteurConst<Arete, Arete>,
FunctorCopieInfosSurDansVectPA__Type<PAIOType<double>,
std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double,
Arete>*> > const>&,
FunctorAccesseurPorteurLocal<__PtrPorteurConst<Arete, Arete> >&,
long, DistributionComposantes&, long, unsigned long, unsigned long,
std::string const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4E9A67:
GISLectureEcriture<double>::__visiteMaillage(Maillage const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4C79A2:
GISLectureEcriture<double>::__ecritGISMPI(std::string,
GroupeInfoSur<double> const&, std::string const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4961AD: main (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== Uninitialised value was created by a heap allocation
==19211== at 0x4C2C27B: malloc (in
/usr/lib64/valgrind/vgpreload___memcheck-amd64-linux.so)
==19211== by 0x2745E78E: ADIOI_Malloc_fn (malloc.c:50)
==19211== by 0x2743757C: ADIOI_NFS_WriteStrided (ad_nfs_write.c:497)
==19211== by 0x27451963: ADIOI_GEN_WriteStridedColl
(ad_write_coll.c:159)
==19211== by 0x274321BD: MPIOI_File_write_all_begin
(write_allb.c:114)
==19211== by 0x27431DBF:
mca_io_romio_dist_MPI_File___write_all_begin (write_allb.c:44)
==19211== by 0x2742A367: mca_io_romio_file_write_all___begin
(io_romio_file_write.c:264)
==19211== by 0x12126520: PMPI_File_write_all_begin
(pfile_write_all_begin.c:74)
==19211== by 0x4D7CFB: SYEnveloppeMessage<std::__string>
PAIO::__ecritureIndexeParBlocMPI<__PAIOType<double>,
PtrPorteurConst<Arete, Arete>,
FunctorCopieInfosSurDansVectPA__Type<PAIOType<double>,
std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double,
Arete>*> > const>,
FunctorAccesseurPorteurLocal<__PtrPorteurConst<Arete, Arete> >
>(PAGroupeProcessus&, ompi_file_t*, long long,
PtrPorteurConst<Arete, Arete>, PtrPorteurConst<Arete, Arete>,
FunctorCopieInfosSurDansVectPA__Type<PAIOType<double>,
std::vector<InfoSur<double, Arete>*, std::allocator<InfoSur<double,
Arete>*> > const>&,
FunctorAccesseurPorteurLocal<__PtrPorteurConst<Arete, Arete> >&,
long, DistributionComposantes&, long, unsigned long, unsigned long,
std::string const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4E9A67:
GISLectureEcriture<double>::__visiteMaillage(Maillage const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4C79A2:
GISLectureEcriture<double>::__ecritGISMPI(std::string,
GroupeInfoSur<double> const&, std::string const&) (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211== by 0x4961AD: main (in
/home/mefpp_ericc/GIREF/bin/__Test.LectureEcritureGISMPI.__opt)
==19211==
Can't tell if it is a big issue or not, but I thought I should
mention it to the list....
We run without this valgrind error when I use my local disk
partition instead of an nfs parition or if I run with only 1
process (which always have something to write for each
PMPI_File_write_all_begin) and write to an nfs partition.
Using openmpi-1.8.4rc3 compiled in "debug" mode:
ompi_info -all :
http://www.giref.ulaval.ca/~__ericc/ompi_bug/ompi_info.all.__184rc3.txt.gz
<http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.184rc3.txt.gz>
config.log:
http://www.giref.ulaval.ca/~__ericc/ompi_bug/config.184rc3.__log.gz
<http://www.giref.ulaval.ca/~ericc/ompi_bug/config.184rc3.log.gz>
Thanks,
Eric
_________________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/__mailman/listinfo.cgi/devel
<http://www.open-mpi.org/mailman/listinfo.cgi/devel>
Link to this post:
http://www.open-mpi.org/__community/lists/devel/2014/12/__16691.php
<http://www.open-mpi.org/community/lists/devel/2014/12/16691.php>