Hi Ralph, That solves the several mtt failures involving collective and intercommunicators (allgather_inter and friends), that occured when running with --mca mpi_procs_cutoff 0. I could reproduce the issue with 8 tasks or more and two nodes (Not sure two nodes matter here...)
In this case proc_list[i] might be a sentinel, so it is not always possible to simply access proc_list[i]->super.proc_name Note this commit was incomplete and i pushed a second one when i figured it out. Cheers, Gilles Ralph Castain <[email protected]> wrote: >Hi Gilles > >Could you please explain this one - I honestly don’t understand the change, >and haven’t encountered a problem. > >Thanks >Ralph > > >> On Jan 5, 2016, at 11:22 PM, [email protected] wrote: >> >> This is an automated email from the git hooks/post-receive script. It was >> generated because a ref change was pushed to the repository containing >> the project "open-mpi/ompi". >> >> The branch, master has been updated >> via 213b2abde47cf02ba3152a301d3ec0ffeec54438 (commit) >> from e4bdad09c1bf7f11dada5ae6ac32e052b553ce4b (commit) >> >> Those revisions listed above that are new to this repository have >> not appeared on any other notification email; so we list those >> revisions in full, below. >> >> - Log ----------------------------------------------------------------- >> https://github.com/open-mpi/ompi/commit/213b2abde47cf02ba3152a301d3ec0ffeec54438 >> >> commit 213b2abde47cf02ba3152a301d3ec0ffeec54438 >> Author: Gilles Gouaillardet <[email protected]> >> Date: Wed Jan 6 16:21:13 2016 +0900 >> >> dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept() >> >> diff --git a/ompi/dpm/dpm.c b/ompi/dpm/dpm.c >> index 9a236d0..b1c562e 100644 >> --- a/ompi/dpm/dpm.c >> +++ b/ompi/dpm/dpm.c >> @@ -16,7 +16,7 @@ >> * Copyright (c) 2011-2015 Los Alamos National Security, LLC. All rights >> * reserved. >> * Copyright (c) 2013-2015 Intel, Inc. All rights reserved >> - * Copyright (c) 2014-2015 Research Organization for Information Science >> + * Copyright (c) 2014-2016 Research Organization for Information Science >> * and Technology (RIST). All rights reserved. >> * $COPYRIGHT$ >> * >> @@ -167,7 +167,13 @@ int ompi_dpm_connect_accept(ompi_communicator_t *comm, >> int root, >> dense = false; >> } >> for (i=0; i < size; i++) { >> - rc = opal_convert_process_name_to_string(&nstring, >> &(proc_list[i]->super.proc_name)); >> + opal_process_name_t proc_name; >> + if (ompi_proc_is_sentinel (proc_list[i])) { >> + proc_name = ompi_proc_sentinel_to_name ((intptr_t) >> proc_list[i]); >> + } else { >> + proc_name = proc_list[i]->super.proc_name; >> + } >> + rc = opal_convert_process_name_to_string(&nstring, &proc_name); >> if (OPAL_SUCCESS != rc) { >> if (!dense) { >> free(proc_list); >> >> >> ----------------------------------------------------------------------- >> >> Summary of changes: >> ompi/dpm/dpm.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> >> hooks/post-receive >> -- >> open-mpi/ompi >> _______________________________________________ >> ompi-commits mailing list >> [email protected] >> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits > >_______________________________________________ >devel mailing list >[email protected] >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: >http://www.open-mpi.org/community/lists/devel/2016/01/18473.php
