Thanks, that was the information I needed.

On Fri, Mar 14, 2014 at 10:18:06PM +0000, Hjelm, Nathan T wrote:
> The preferred way is to use mca_base_var_find and then call 
> mca_base_var_[set|get]_value. For performance sake we only look at the 
> environment when the variable is registered.
> 
> -Nathan
> 
> Please excuse the horrible Outlook top-posting. OWA sucks.
> 
> ________________________________________
> From: devel [devel-boun...@open-mpi.org] on behalf of Adrian Reber 
> [adr...@lisas.de]
> Sent: Friday, March 14, 2014 3:05 PM
> To: de...@open-mpi.org
> Subject: [OMPI devel] usage of mca variables in orte-restart
> 
> I am now trying to run orte-restart. As far as I understand it
> orte-restart analyzes the checkpoint metadata and then tries to exec()
> mpirun which then starts opal-restart. During the startup of
> opal-restart (during initialize()) detection of the best CRS module is
> disabled:
> 
>     /*
>      * Turn off the selection of the CRS component,
>      * we need to do that later
>      */
>     (void) mca_base_var_env_name("crs_base_do_not_select", &tmp_env_var);
>     opal_setenv(tmp_env_var,
>                 "1", /* turn off the selection */
>                 true, &environ);
>     free(tmp_env_var);
>     tmp_env_var = NULL;
> 
> This seems to work. Later when actually selecting the correct CRS module
> to restart the checkpointed process the selection is enabled again:
> 
>     /* Re-enable the selection of the CRS component, so we can choose the 
> right one */
>     (void) mca_base_var_env_name("crs_base_do_not_select", &tmp_env_var);
>     opal_setenv(tmp_env_var,
>                 "0", /* turn on the selection */
>                 true, &environ);
>     free(tmp_env_var);
>     tmp_env_var = NULL;
> 
> This does not seem to have an effect. The one reason why it does not work
> is pretty obvious. The mca variable crs_base_do_not_select is registered 
> during
> opal_crs_base_register() and written to the bool variable 
> opal_crs_base_do_not_select
> only once (during register). Later in opal_crs_base_select() this bool
> variable is queried if select should run or not and as it is only changed
> during register it never changes. So from the code flow it cannot work
> and is probably the result of one of the rewrites since C/R was introduced.
> 
> To fix this I am trying to read the value of the MCA variable
> opal_crs_base_do_not_select during opal_crs_base_select() like this:
> 
>  idx = mca_base_var_find("opal", "crs", "base", "do_not_select")
>  mca_base_var_get_value(idx, &value, NULL, NULL);
> 
> This also seems to work because it is different if I change the first
> opal_setenv() during initialize(). The problem I am seeing is that the
> second opal_setenv() (back to 0) cannot be detected using 
> mca_base_var_get_value().
> 
> So my question is: what is the preferred way to read and write MCA
> variables to access them in the different modules? Is the existing
> code still correct? There is also mca_base_var_set_value() should I rather
> use this to set 'opal_crs_base_do_not_select'. I was, however, not able
> to use mca_base_var_set_value() without a segfault. There are not much
> uses of mca_base_var_set_value() in the existing code and none uses
> a bool variable.
> 
> I also discovered I can just access to global C variable 
> 'opal_crs_base_do_not_select'
> from opal-restart.c as well as from opal_crs_base_select(). This also works.
> This would solve my problem setting and reading MCA variables.
> 
>                 Adrian

Reply via email to