Okay, I fixed those two and will release rc4 once the coll/ml fix has been 
reviewed. Thanks

On Aug 1, 2014, at 2:46 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:

> Also, latest commit into openib (origin/v1.8 
> https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something:
> 
> 11:45:01 + timeout -s SIGSEGV 3m 
> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/bin/mpirun 
> -np 8 -mca pml ob1 -mca btl self,openib 
> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/examples/hello_usempi
> 11:45:01 
> --------------------------------------------------------------------------
> 11:45:01 WARNING: There are more than one active ports on host 'hpctest', but 
> the
> 11:45:01 default subnet GID prefix was detected on more than one of these
> 11:45:01 ports.  If these ports are connected to different physical IB
> 11:45:01 networks, this configuration will fail in Open MPI.  This version of
> 11:45:01 Open MPI requires that every physically separate IB subnet that is
> 11:45:01 used between connected MPI processes must have different subnet ID
> 11:45:01 values.
> 11:45:01 
> 11:45:01 Please see this FAQ entry for more details:
> 11:45:01 
> 11:45:01   
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
> 11:45:01 
> 11:45:01 NOTE: You can turn off this warning by setting the MCA parameter
> 11:45:01       btl_openib_warn_default_gid_prefix to 0.
> 11:45:01 
> --------------------------------------------------------------------------
> 11:45:01 
> --------------------------------------------------------------------------
> 11:45:01 WARNING: No queue pairs were defined in the btl_openib_receive_queues
> 11:45:01 MCA parameter.  At least one queue pair must be defined.  The
> 11:45:01 OpenFabrics (openib) BTL will therefore be deactivated for this run.
> 11:45:01 
> 11:45:01   Local host: hpctest
> 11:45:01 
> --------------------------------------------------------------------------
> 11:45:01 
> --------------------------------------------------------------------------
> 11:45:01 At least one pair of MPI processes are unable to reach each other for
> 11:45:01 MPI communications.  This means that no Open MPI device has indicated
> 11:45:01 that it can be used to communicate between these processes.  This is
> 11:45:01 an error; Open MPI requires that all MPI processes be able to reach
> 11:45:01 each other.  This error can sometimes be the result of forgetting to
> 11:45:01 specify the "self" BTL.
> 11:45:01 
> 11:45:01   Process 1 ([[55281,1],1]) is on host: hpctest
> 11:45:01   Process 2 ([[55281,1],0]) is on host: hpctest
> 11:45:01   BTLs attempted: self
> 11:45:01 
> 11:45:01 Your MPI job is now going to abort; sorry.
> 11:45:01 
> --------------------------------------------------------------------------
> 11:45:01 
> --------------------------------------------------------------------------
> 11:45:01 MPI_INIT has failed because at least one MPI process is unreachable
> 11:45:01 from another.  This *usually* means that an underlying communication
> 11:45:01 plugin -- such as a BTL or an MTL -- has either not loaded or not
> 11:45:01 allowed itself to be used.  Your MPI job will now abort.
> 11:45:01 
> 11:45:01 You may wish to try to narrow down the problem;
> 11:45:01 
> 11:45:01  * Check the output of ompi_info to see which BTL/MTL plugins are
> 11:45:01    available.
> 11:45:01  * Run your application with MPI_THREAD_SINGLE.
> 11:45:01  * Set the MCA parameter btl_base_verbose to 100 (or 
> mtl_base_verbose,
> 11:45:01    if using MTL-based communications) to see exactly which
> 11:45:01    communication plugins were considered and/or discarded.
> 11:45:01 
> --------------------------------------------------------------------------
> 11:45:01 *** An error occurred in MPI_Init
> 11:45:01 *** on a NULL communicator
> 11:45:01 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> 11:45:01 ***    and potentially your MPI job)
> 11:45:01 [hpctest:2761] Local abort before MPI_INIT completed successfully; 
> not able to aggregate error messages, and not able to guarantee that all 
> other processes were killed!
> 11:45:01 *** An error occurred in MPI_Init
> 11:45:01 *** on a NULL communicator
> 11:45:01 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> 11:45:01 ***    and potentially your MPI job)
> 11:45:01 [hpctest:2757] Local abort before MPI_INIT completed successfully; 
> not able to aggregate error messages, and not able to guarantee that all 
> other processes were killed!
> 11:45:01 *** An error occurred in MPI_Init
> 11:45:01 *** on a NULL communicator
> 11:45:01 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> 11:45:01 ***    and potentially your MPI job)
> 11:45:01 [hpctest:2751] Local abort before MPI_INIT completed successfully; 
> not able to aggregate error messages, and not able to guarantee that all 
> other processes were killed!
> 11:45:01 *** An error occurred in MPI_Init
> 11:45:01 *** on a NULL communicator
> 11:45:01 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> 11:45:01 ***    and potentially your MPI job)
> 11:45:01 [hpctest:2752] Local abort before MPI_INIT completed successfully; 
> not able to aggregate error messages, and not able to guarantee that all 
> other processes were killed!
> 11:45:01 *** An error occurred in MPI_Init
> 11:45:01 *** on a NULL communicator
> 11:45:01 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> 11:45:01 ***    and potentially your MPI job)
> 11:45:01 [hpctest:2753] Local abort before MPI_INIT completed successfully; 
> not able to aggregate error messages, and not able to guarantee that all 
> other processes were killed!
> 11:45:01 *** An error occurred in MPI_Init
> 11:45:01 *** on a NULL communicator
> 11:45:01 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> 11:45:01 ***    and potentially your MPI job)
> 11:45:01 [hpctest:2755] Local abort before MPI_INIT completed successfully; 
> not able to aggregate error messages, and not able to guarantee that all 
> other processes were killed!
> 11:45:01 *** An error occurred in MPI_Init
> 11:45:01 *** on a NULL communicator
> 11:45:01 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> 11:45:01 ***    and potentially your MPI job)
> 11:45:01 [hpctest:2759] Local abort before MPI_INIT completed successfully; 
> not able to aggregate error messages, and not able to guarantee that all 
> other processes were killed!
> 11:45:01 *** An error occurred in MPI_Init
> 11:45:01 *** on a NULL communicator
> 11:45:01 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> 11:45:01 ***    and potentially your MPI job)
> 11:45:01 [hpctest:2763] Local abort before MPI_INIT completed successfully; 
> not able to aggregate error messages, and not able to guarantee that all 
> other processes were killed!
> 
> 
> On Fri, Aug 1, 2014 at 11:00 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> Note that the Solaris unresolved alloca problem George fixed in r32388 is 
> still present in 1.8.2rc3.
> I have manually confirmed that the same patch resolves the problem in 
> 1.8.2rc3.
> 
> -Paul
> 
> 
> On Thu, Jul 31, 2014 at 9:44 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Usual place - this is a last-chance check, so please hit it. Main change from 
> rc2 is the repairs to the Fortran binding config logic
> 
> http://www.open-mpi.org/software/ompi/v1.8/
> 
> Ralph
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15433.php
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15440.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15444.php

Reply via email to