Yes - this is the fix for that issue
On Thu, May 14, 2015 at 8:54 PM, Howard Pritchard <hpprit...@gmail.com> wrote: > Is this by any chance associated with issue 579? > > > 2015-05-14 20:49 GMT-06:00 Ralph Castain <r...@open-mpi.org>: > >> I'll look at the lines you cite, but that clearly isn't the problem we >> are seeing here. I can verify that because the test case: >> >> mpirun -n 1 sleep 1000 >> >> does not open up any connections at all. Thus, the use-case you describe >> never occurs - yet we still blow up in memory. If I simply tell the OOB not >> to set keep alive, the problem goes away. >> >> It only happens on Mac, and we never see Mac based clusters, so turning >> off keep alive on the Mac seems a pretty simple solution. >> >> >> On Thu, May 14, 2015 at 8:43 PM, George Bosilca <bosi...@icl.utk.edu> >> wrote: >> >>> Ralph, >>> >>> The code pushed in g8e30579 is clearly not the right solution. >>> >>> The problem starts in oob_tcp_listener.c line 742. A new >>> mca_oob_tcp_pending_connection_t object is allocated to store the incoming >>> connection. The accept few lines below fails with an error code of 0x23 >>> which means "resource temporary unavailable" on OS X (i.e. EAGAIN). Thus, >>> the if at line 750 is skipped, and we reach line 763 (a "continue") with 1) >>> a connection not accepted, and 2) an allocated object not release. Voila! >>> >>> Freeing the pending_connection object is not the right approach either, >>> as it will only remove the memory leak but the process will become a CPU >>> hog. >>> >>> Thanks, >>> George. >>> >>> >>> >>> >>> On Thu, May 14, 2015 at 8:10 PM, <git...@crest.iu.edu> wrote: >>> >>>> This is an automated email from the git hooks/post-receive script. It >>>> was >>>> generated because a ref change was pushed to the repository containing >>>> the project "open-mpi/ompi". >>>> >>>> The branch, master has been updated >>>> via 8e30579e6efab580cf9cf1bec8f8df1376b7e9ef (commit) >>>> from 1488e82efd1d09c30ba46dfa00b89e623623272f (commit) >>>> >>>> Those revisions listed above that are new to this repository have >>>> not appeared on any other notification email; so we list those >>>> revisions in full, below. >>>> >>>> - Log ----------------------------------------------------------------- >>>> >>>> https://github.com/open-mpi/ompi/commit/8e30579e6efab580cf9cf1bec8f8df1376b7e9ef >>>> >>>> commit 8e30579e6efab580cf9cf1bec8f8df1376b7e9ef >>>> Author: Ralph Castain <r...@open-mpi.org> >>>> Date: Thu May 14 18:09:13 2015 -0600 >>>> >>>> The Mac appears to have problems with the keepalive support - once >>>> keepalive starts, the memory footprint soars. So disable keepalive on the >>>> Mac >>>> >>>> diff --git a/config/opal_check_os_flavors.m4 >>>> b/config/opal_check_os_flavors.m4 >>>> index d1d124d..4939560 100644 >>>> --- a/config/opal_check_os_flavors.m4 >>>> +++ b/config/opal_check_os_flavors.m4 >>>> @@ -57,6 +57,12 @@ AC_DEFUN([OPAL_CHECK_OS_FLAVORS], >>>> [$opal_have_solaris], >>>> [Whether or not we have solaris]) >>>> >>>> + AS_IF([test "$opal_found_apple" = "yes"], >>>> + [opal_have_mac=1], [opal_have_mac=0]) >>>> + AC_DEFINE_UNQUOTED([OPAL_HAVE_MAC], >>>> + [$opal_have_mac], >>>> + [Whether or not we are on a Mac]) >>>> + >>>> # check for sockaddr_in (a good sign we have TCP) >>>> AC_CHECK_HEADERS([netdb.h netinet/in.h netinet/tcp.h]) >>>> AC_CHECK_TYPES([struct sockaddr_in], >>>> diff --git a/orte/mca/oob/tcp/oob_tcp_common.c >>>> b/orte/mca/oob/tcp/oob_tcp_common.c >>>> index a768472..e3decf2 100644 >>>> --- a/orte/mca/oob/tcp/oob_tcp_common.c >>>> +++ b/orte/mca/oob/tcp/oob_tcp_common.c >>>> @@ -72,7 +72,7 @@ >>>> /** >>>> * Set socket buffering >>>> */ >>>> - >>>> +#if defined(SO_KEEPALIVE) && !OPAL_HAVE_MAC >>>> static void set_keepalive(int sd) >>>> { >>>> int option; >>>> @@ -146,6 +146,7 @@ static void set_keepalive(int sd) >>>> } >>>> #endif // TCP_KEEPCNT >>>> } >>>> +#endif //SO_KEEPALIVE >>>> >>>> void orte_oob_tcp_set_socket_options(int sd) >>>> { >>>> @@ -181,7 +182,7 @@ void orte_oob_tcp_set_socket_options(int sd) >>>> opal_socket_errno); >>>> } >>>> #endif >>>> -#if defined(SO_KEEPALIVE) >>>> +#if defined(SO_KEEPALIVE) && !OPAL_HAVE_MAC >>>> if (0 < mca_oob_tcp_component.keepalive_time) { >>>> set_keepalive(sd); >>>> } >>>> diff --git a/orte/mca/oob/tcp/oob_tcp_component.c >>>> b/orte/mca/oob/tcp/oob_tcp_component.c >>>> index dd1af2a..372ed4c 100644 >>>> --- a/orte/mca/oob/tcp/oob_tcp_component.c >>>> +++ b/orte/mca/oob/tcp/oob_tcp_component.c >>>> @@ -404,7 +404,7 @@ static int tcp_component_register(void) >>>> >>>> &mca_oob_tcp_component.disable_ipv6_family); >>>> #endif >>>> >>>> - >>>> +#if !OPAL_HAVE_MAC >>>> mca_oob_tcp_component.keepalive_time = 10; >>>> (void)mca_base_component_var_register(component, "keepalive_time", >>>> "Idle time in seconds before >>>> starting to send keepalives (num <= 0 ----> disable keepalive)", >>>> @@ -427,7 +427,8 @@ static int tcp_component_register(void) >>>> OPAL_INFO_LVL_9, >>>> MCA_BASE_VAR_SCOPE_READONLY, >>>> >>>> &mca_oob_tcp_component.keepalive_probes); >>>> - >>>> +#endif >>>> + >>>> mca_oob_tcp_component.retry_delay = 0; >>>> (void)mca_base_component_var_register(component, "retry_delay", >>>> "Time (in sec) to wait >>>> before trying to connect to peer again", >>>> >>>> >>>> ----------------------------------------------------------------------- >>>> >>>> Summary of changes: >>>> config/opal_check_os_flavors.m4 | 6 ++++++ >>>> orte/mca/oob/tcp/oob_tcp_common.c | 5 +++-- >>>> orte/mca/oob/tcp/oob_tcp_component.c | 5 +++-- >>>> 3 files changed, 12 insertions(+), 4 deletions(-) >>>> >>>> >>>> hooks/post-receive >>>> -- >>>> open-mpi/ompi >>>> _______________________________________________ >>>> ompi-commits mailing list >>>> ompi-comm...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits >>>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/05/17401.php >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/05/17402.php >> > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/05/17403.php >