r20569 fixes the problem, but I'm not 100% sure it's the Right Way.

Short version: now that we're guaranteeing to free the event base, we're exercising a code path that was never used before. Apparently the orted initializes the ev->timebase min_heap_t structure, but then never uses it. So the pointer to the array of events in the heap is still NULL when we get to the destructor. Previously, the destructor just unconditionally freed the array. I put in a NULL check, which avoids the problem.

But it begs the question -- why is that data structure being initialized/freed if we're never using it? Is it something inherent in libevent?


On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote:

Unfortunately, this doesn't fully fix the problem -- I'm still getting
bad frees:

[16:47] svbu-mpi:~/mpi % ./hello
stdout: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
malloc debug: Invalid free (min_heap.h, 58)

[16:48] svbu-mpi:~/mpi % mpirun -np 1 hello
[svbu-mpi001:27549] ********** Parsing receive_queues
stdout: Hello, world!  I am 0 of 1 (svbu-mpi001)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi001)
malloc debug: Invalid free (min_heap.h, 58)


On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote:

> Author: bosilca
> Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009)
> New Revision: 20568
> URL: https://svn.open-mpi.org/trac/ompi/changeset/20568
>
> Log:
> Make sure we correctly unregister all persistent events
> and signal handlers.
>
> Text files modified:
>   trunk/orte/orted/orted_main.c  |     8 ++++++++
>   trunk/orte/runtime/orte_wait.c |     4 ++--
>   2 files changed, 10 insertions(+), 2 deletions(-)
>
> Modified: trunk/orte/orted/orted_main.c
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> --- trunk/orte/orted/orted_main.c     (original)
> +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST (Mon, 16
> Feb 2009)
> @@ -754,6 +754,14 @@
>         exit(ORTE_ERROR_DEFAULT_EXIT_CODE);
>     }
>
> +    /* Release all local signal handlers */
> +    opal_event_del(&term_handler);
> +    opal_event_del(&int_handler);
> +#ifndef __WINDOWS__
> +    opal_signal_del(&sigusr1_handler);
> +    opal_signal_del(&sigusr2_handler);
> +#endif  /* __WINDOWS__ */
> +
>     /* Finalize and clean up ourselves */
>     ret = orte_finalize();
>     exit(ret);
>
> Modified: trunk/orte/runtime/orte_wait.c
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> --- trunk/orte/runtime/orte_wait.c    (original)
> +++ trunk/orte/runtime/orte_wait.c 2009-02-16 19:20:05 EST (Mon, 16
> Feb 2009)
> @@ -517,8 +517,8 @@
>     /* define the event to fire when someone writes to the pipe */
>     opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL);
>
> -     /* Add it to the active events, without a timeout */
> -     opal_event_add(*event, NULL);
> +    /* Add it to the active events, without a timeout */
> +    opal_event_add(*event, NULL);
>
>     /* all done */
>     return ORTE_SUCCESS;
> _______________________________________________
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to