Thanks Ralph,

It is now *much* better: all sequential executions are working... ;)
but I still have issues with a lot of parallel tests... (but not all)

The SHA tested last night was c3c262b.

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.14.01h20m32s_config.log

Here is what is the backtrace for most of these issues:

*** Error in `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt': free(): invalid pointer: 0x00007f9ab09c6020 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7277f)[0x7f9ab019b77f]
/lib64/libc.so.6(+0x78026)[0x7f9ab01a1026]
/lib64/libc.so.6(+0x78d53)[0x7f9ab01a1d53]
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x172a1)[0x7f9aa3df32a1]
/opt/openmpi-2.x_opt/lib/libmpi.so.0(MPI_Request_free+0x4c)[0x7f9ab0761dac]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adaf9)[0x7f9ab7fa2af9]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f9ab7f9dc35]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574e7)[0x7f9ab7f4c4e7]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f9ab7ef28ca]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_Z15GIREFVecDestroyRP6_p_Vec+0xe)[0x7f9abc9746de]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN12VecteurPETScD1Ev+0x31)[0x7f9abca8bfa1]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN10SolveurGCPD2Ev+0x20c)[0x7f9abc9a013c]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN10SolveurGCPD0Ev+0x9)[0x7f9abc9a01f9]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Formulation.so(_ZN10ProblemeGDD2Ev+0x42)[0x7f9abeeb94e2]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt[0x4159b9]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9ab014ab25]
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt[0x4084dc]

The very same code ans tests are all working well with openmpi-1.{8.4,10.2} and the same version of PETSc...

And the segfault with MPI_File_write_all_end seems gone... Thanks to Edgar! :)

Btw, I am wondering when I should report a bug or not, since I am "blindly" cloning around 01h20 am each day, independently of the "status" of the master... I don't want to bother anyone on this list with annoying bug reports... So tell me what you would like please...

Thanks,

Eric


On 13/07/16 08:36 PM, Ralph Castain wrote:
Fixed on master

On Jul 13, 2016, at 12:47 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
wrote:

I literally just noticed that this morning (that singleton was broken on 
master), but hadn't gotten to bisecting / reporting it yet...

I also haven't tested 2.0.0.  I really hope singletons aren't broken then...

/me goes to test 2.0.0...

Whew -- 2.0.0 singletons are fine.  :-)


On Jul 13, 2016, at 3:01 PM, Ralph Castain <r...@open-mpi.org> wrote:

Hmmm…I see where the singleton on master might be broken - will check later 
today

On Jul 13, 2016, at 11:37 AM, Eric Chamberland 
<eric.chamberl...@giref.ulaval.ca> wrote:

Hi Howard,

ok, I will wait for 2.0.1rcX... ;)

I've put in place a script to download/compile OpenMPI+PETSc(3.7.2) and our 
code from the git repos.

Now I am in a somewhat uncomfortable situation where neither the 
ompi-release.git or ompi.git repos are working for me.

The first gives me the errors with MPI_File_write_all_end I reported, but the 
former gives me errors like these:

[lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
ess_singleton_module.c at line 167
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[lorien:106919] Local abort before MPI_INIT completed completed successfully, 
but am not able to aggregate error messages, and not able to guarantee that all 
other processes were killed!

So, for my continuous integration of OpenMPI I am in a no man's land... :(

Thanks anyway for the follow-up!

Eric

On 13/07/16 07:49 AM, Howard Pritchard wrote:
Hi Eric,

Thanks very much for finding this problem.   We decided in order to have
a reasonably timely
release, that we'd triage issues and turn around a new RC if something
drastic
appeared.  We want to fix this issue (and it will be fixed), but we've
decided to
defer the fix for this issue to a 2.0.1 bug fix release.

Howard



2016-07-12 13:51 GMT-06:00 Eric Chamberland
<eric.chamberl...@giref.ulaval.ca
<mailto:eric.chamberl...@giref.ulaval.ca>>:

 Hi Edgard,

 I just saw that your patch got into ompi/master... any chances it
 goes into ompi-release/v2.x before rc5?

 thanks,

 Eric


 On 08/07/16 03:14 PM, Edgar Gabriel wrote:

     I think I found the problem, I filed a pr towards master, and if
     that
     passes I will file a pr for the 2.x branch.

     Thanks!
     Edgar


     On 7/8/2016 1:14 PM, Eric Chamberland wrote:


         On 08/07/16 01:44 PM, Edgar Gabriel wrote:

             ok, but just to be able to construct a test case,
             basically what you are
             doing is

             MPI_File_write_all_begin (fh, NULL, 0, some datatype);

             MPI_File_write_all_end (fh, NULL, &status),

             is this correct?

         Yes, but with 2 processes:

         rank 0 writes something, but not rank 1...

         other info: rank 0 didn't wait for rank1 after
         MPI_File_write_all_end so
         it continued to the next MPI_File_write_all_begin with a
         different
         datatype but on the same file...

         thanks!

         Eric
         _______________________________________________
         devel mailing list
         de...@open-mpi.org <mailto:de...@open-mpi.org>
         Subscription:
         https://www.open-mpi.org/mailman/listinfo.cgi/devel
         Link to this post:
         http://www.open-mpi.org/community/lists/devel/2016/07/19173.php


 _______________________________________________
 devel mailing list
 de...@open-mpi.org <mailto:de...@open-mpi.org>
 Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post:
 http://www.open-mpi.org/community/lists/devel/2016/07/19192.php


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/07/19201.php

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/07/19202.php


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/07/19203.php

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/07/19208.php

Reply via email to