FWIW: I have installed a temporary patch that allows the trunk to run by no 
longer finalizing OPAL. Once the param system has been repaired, this will be 
removed. Meantime, at least you can run the trunk.

On Dec 24, 2012, at 10:39 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Hi folks
> 
> This is a heads-up to all: It appears a recent commit has broken the trunk - 
> I think it relates to something done to the MCA parameter system. When 
> running across multiple nodes, the daemons segfault on finalize with a 
> stacktrace of:
> 
> (gdb) where
> #0  0x0000003dc4477e92 in _int_free () from /lib64/libc.so.6
> #1  0x00007f18a163f756 in param_destructor (p=0x118d940) at 
> mca_base_param.c:1982
> #2  0x00007f18a163ab41 in opal_obj_run_destructors (object=0x118d940) at 
> ../../../opal/class/opal_object.h:448
> #3  0x00007f18a163cb94 in mca_base_param_finalize () at mca_base_param.c:853
> #4  0x00007f18a1609c06 in opal_finalize_util () at runtime/opal_finalize.c:69
> #5  0x00007f18a1609cbc in opal_finalize () at runtime/opal_finalize.c:155
> #6  0x00007f18a18e366b in orte_finalize () at runtime/orte_finalize.c:107
> #7  0x00007f18a1911313 in orte_daemon (argc=35, argv=0x7ffffd7ea8b8) at 
> orted/orted_main.c:834
> #8  0x000000000040091a in main (argc=35, argv=0x7ffffd7ea8b8) at orted.c:62
> (gdb) up
> #1  0x00007f18a163f756 in param_destructor (p=0x118d940) at 
> mca_base_param.c:1982
> 1982          free(p->mbp_env_var_name);
> 
> gdb) print array[i]
> $2 = {mbp_super = {obj_magic_id = 0, obj_class = 0x7f18a18c6460, 
> obj_reference_count = 1, cls_init_file_name = 0x7f18a169d04e 
> "mca_base_param.c", 
>    cls_init_lineno = 1154}, mbp_type = MCA_BASE_PARAM_TYPE_STRING, 
> mbp_type_name = 0x1185110 "\300O\030\001", mbp_component_name = 0x0, 
>  mbp_param_name = 0x1185130 "", mbp_full_name = 0x1185150 
> "orte_debugger_test_daemon", mbp_synonyms = 0x0, mbp_internal = false, 
>  mbp_read_only = false, mbp_deprecated = false, mbp_deprecated_warning_shown 
> = true, 
>  mbp_help_msg = 0x11850a0 "Name of the executable to be used to simulate a 
> debugger colaunch (relative or absolute path)", 
>  mbp_env_var_name = 0x1185180 "\020P\030\001", mbp_default_value = {intval = 
> 0, stringval = 0x0}, mbp_file_value_set = false, mbp_file_value = {
>    intval = 0, stringval = 0x0}, mbp_source_file = 0x0, 
> mbp_override_value_set = false, mbp_override_value = {intval = 0, stringval = 
> 0x0}}
> 
> As you can see, the problem is that the mbp_env_var_name field is trash, so 
> the destructor's attempt to free that field crashes.
> 
> I believe it was Nathan that last touched this area, so perhaps he could take 
> a gander and see what happened? Meantime, I'm afraid the trunk is down.
> 
> Thanks
> Ralph
> 


Reply via email to