I don't believe we support changing the value of an MCA param on-the-fly - 
you'd need to transfer it to an appropriate-level global that you can change as 
required

On Mar 14, 2014, at 2:05 PM, Adrian Reber <adr...@lisas.de> wrote:

> I am now trying to run orte-restart. As far as I understand it
> orte-restart analyzes the checkpoint metadata and then tries to exec()
> mpirun which then starts opal-restart. During the startup of
> opal-restart (during initialize()) detection of the best CRS module is
> disabled:
> 
>    /* 
>     * Turn off the selection of the CRS component,
>     * we need to do that later
>     */
>    (void) mca_base_var_env_name("crs_base_do_not_select", &tmp_env_var);
>    opal_setenv(tmp_env_var,
>                "1", /* turn off the selection */
>                true, &environ);
>    free(tmp_env_var);
>    tmp_env_var = NULL;
> 
> This seems to work. Later when actually selecting the correct CRS module
> to restart the checkpointed process the selection is enabled again:
> 
>    /* Re-enable the selection of the CRS component, so we can choose the 
> right one */
>    (void) mca_base_var_env_name("crs_base_do_not_select", &tmp_env_var);
>    opal_setenv(tmp_env_var,
>                "0", /* turn on the selection */
>                true, &environ);
>    free(tmp_env_var);
>    tmp_env_var = NULL;
> 
> This does not seem to have an effect. The one reason why it does not work
> is pretty obvious. The mca variable crs_base_do_not_select is registered 
> during
> opal_crs_base_register() and written to the bool variable 
> opal_crs_base_do_not_select
> only once (during register). Later in opal_crs_base_select() this bool
> variable is queried if select should run or not and as it is only changed
> during register it never changes. So from the code flow it cannot work
> and is probably the result of one of the rewrites since C/R was introduced.
> 
> To fix this I am trying to read the value of the MCA variable
> opal_crs_base_do_not_select during opal_crs_base_select() like this:
> 
> idx = mca_base_var_find("opal", "crs", "base", "do_not_select")
> mca_base_var_get_value(idx, &value, NULL, NULL);
> 
> This also seems to work because it is different if I change the first
> opal_setenv() during initialize(). The problem I am seeing is that the
> second opal_setenv() (back to 0) cannot be detected using 
> mca_base_var_get_value().
> 
> So my question is: what is the preferred way to read and write MCA
> variables to access them in the different modules? Is the existing
> code still correct? There is also mca_base_var_set_value() should I rather
> use this to set 'opal_crs_base_do_not_select'. I was, however, not able
> to use mca_base_var_set_value() without a segfault. There are not much
> uses of mca_base_var_set_value() in the existing code and none uses
> a bool variable.
> 
> I also discovered I can just access to global C variable 
> 'opal_crs_base_do_not_select'
> from opal-restart.c as well as from opal_crs_base_select(). This also works.
> This would solve my problem setting and reading MCA variables.
> 
>               Adrian
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/03/14347.php

Reply via email to