Thanks Ralph,
It is now *much* better: all sequential executions are working... ;)
but I still have issues with a lot of parallel tests... (but not all)
The SHA tested last night was c3c262b.
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.14.01h20m32s_config.log
Here is what is the backtrace for most of these issues:
*** Error in
`/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt':
free(): invalid pointer: 0x00007f9ab09c6020 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7277f)[0x7f9ab019b77f]
/lib64/libc.so.6(+0x78026)[0x7f9ab01a1026]
/lib64/libc.so.6(+0x78d53)[0x7f9ab01a1d53]
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x172a1)[0x7f9aa3df32a1]
/opt/openmpi-2.x_opt/lib/libmpi.so.0(MPI_Request_free+0x4c)[0x7f9ab0761dac]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adaf9)[0x7f9ab7fa2af9]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f9ab7f9dc35]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574e7)[0x7f9ab7f4c4e7]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f9ab7ef28ca]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_Z15GIREFVecDestroyRP6_p_Vec+0xe)[0x7f9abc9746de]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN12VecteurPETScD1Ev+0x31)[0x7f9abca8bfa1]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN10SolveurGCPD2Ev+0x20c)[0x7f9abc9a013c]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN10SolveurGCPD0Ev+0x9)[0x7f9abc9a01f9]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Formulation.so(_ZN10ProblemeGDD2Ev+0x42)[0x7f9abeeb94e2]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt[0x4159b9]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9ab014ab25]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt[0x4084dc]
The very same code ans tests are all working well with
openmpi-1.{8.4,10.2} and the same version of PETSc...
And the segfault with MPI_File_write_all_end seems gone... Thanks to
Edgar! :)
Btw, I am wondering when I should report a bug or not, since I am
"blindly" cloning around 01h20 am each day, independently of the
"status" of the master... I don't want to bother anyone on this list
with annoying bug reports... So tell me what you would like please...
Thanks,
Eric
On 13/07/16 08:36 PM, Ralph Castain wrote:
Fixed on master
On Jul 13, 2016, at 12:47 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:
I literally just noticed that this morning (that singleton was broken on
master), but hadn't gotten to bisecting / reporting it yet...
I also haven't tested 2.0.0. I really hope singletons aren't broken then...
/me goes to test 2.0.0...
Whew -- 2.0.0 singletons are fine. :-)
On Jul 13, 2016, at 3:01 PM, Ralph Castain <r...@open-mpi.org> wrote:
Hmmm…I see where the singleton on master might be broken - will check later
today
On Jul 13, 2016, at 11:37 AM, Eric Chamberland
<eric.chamberl...@giref.ulaval.ca> wrote:
Hi Howard,
ok, I will wait for 2.0.1rcX... ;)
I've put in place a script to download/compile OpenMPI+PETSc(3.7.2) and our
code from the git repos.
Now I am in a somewhat uncomfortable situation where neither the
ompi-release.git or ompi.git repos are working for me.
The first gives me the errors with MPI_File_write_all_end I reported, but the
former gives me errors like these:
[lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file
ess_singleton_module.c at line 167
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[lorien:106919] Local abort before MPI_INIT completed completed successfully,
but am not able to aggregate error messages, and not able to guarantee that all
other processes were killed!
So, for my continuous integration of OpenMPI I am in a no man's land... :(
Thanks anyway for the follow-up!
Eric
On 13/07/16 07:49 AM, Howard Pritchard wrote:
Hi Eric,
Thanks very much for finding this problem. We decided in order to have
a reasonably timely
release, that we'd triage issues and turn around a new RC if something
drastic
appeared. We want to fix this issue (and it will be fixed), but we've
decided to
defer the fix for this issue to a 2.0.1 bug fix release.
Howard
2016-07-12 13:51 GMT-06:00 Eric Chamberland
<eric.chamberl...@giref.ulaval.ca
<mailto:eric.chamberl...@giref.ulaval.ca>>:
Hi Edgard,
I just saw that your patch got into ompi/master... any chances it
goes into ompi-release/v2.x before rc5?
thanks,
Eric
On 08/07/16 03:14 PM, Edgar Gabriel wrote:
I think I found the problem, I filed a pr towards master, and if
that
passes I will file a pr for the 2.x branch.
Thanks!
Edgar
On 7/8/2016 1:14 PM, Eric Chamberland wrote:
On 08/07/16 01:44 PM, Edgar Gabriel wrote:
ok, but just to be able to construct a test case,
basically what you are
doing is
MPI_File_write_all_begin (fh, NULL, 0, some datatype);
MPI_File_write_all_end (fh, NULL, &status),
is this correct?
Yes, but with 2 processes:
rank 0 writes something, but not rank 1...
other info: rank 0 didn't wait for rank1 after
MPI_File_write_all_end so
it continued to the next MPI_File_write_all_begin with a
different
datatype but on the same file...
thanks!
Eric
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription:
https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19173.php
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19192.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19201.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19202.php
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19203.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19208.php