Thanks, that was the information I needed.
On Fri, Mar 14, 2014 at 10:18:06PM +0000, Hjelm, Nathan T wrote: > The preferred way is to use mca_base_var_find and then call > mca_base_var_[set|get]_value. For performance sake we only look at the > environment when the variable is registered. > > -Nathan > > Please excuse the horrible Outlook top-posting. OWA sucks. > > ________________________________________ > From: devel [devel-boun...@open-mpi.org] on behalf of Adrian Reber > [adr...@lisas.de] > Sent: Friday, March 14, 2014 3:05 PM > To: de...@open-mpi.org > Subject: [OMPI devel] usage of mca variables in orte-restart > > I am now trying to run orte-restart. As far as I understand it > orte-restart analyzes the checkpoint metadata and then tries to exec() > mpirun which then starts opal-restart. During the startup of > opal-restart (during initialize()) detection of the best CRS module is > disabled: > > /* > * Turn off the selection of the CRS component, > * we need to do that later > */ > (void) mca_base_var_env_name("crs_base_do_not_select", &tmp_env_var); > opal_setenv(tmp_env_var, > "1", /* turn off the selection */ > true, &environ); > free(tmp_env_var); > tmp_env_var = NULL; > > This seems to work. Later when actually selecting the correct CRS module > to restart the checkpointed process the selection is enabled again: > > /* Re-enable the selection of the CRS component, so we can choose the > right one */ > (void) mca_base_var_env_name("crs_base_do_not_select", &tmp_env_var); > opal_setenv(tmp_env_var, > "0", /* turn on the selection */ > true, &environ); > free(tmp_env_var); > tmp_env_var = NULL; > > This does not seem to have an effect. The one reason why it does not work > is pretty obvious. The mca variable crs_base_do_not_select is registered > during > opal_crs_base_register() and written to the bool variable > opal_crs_base_do_not_select > only once (during register). Later in opal_crs_base_select() this bool > variable is queried if select should run or not and as it is only changed > during register it never changes. So from the code flow it cannot work > and is probably the result of one of the rewrites since C/R was introduced. > > To fix this I am trying to read the value of the MCA variable > opal_crs_base_do_not_select during opal_crs_base_select() like this: > > idx = mca_base_var_find("opal", "crs", "base", "do_not_select") > mca_base_var_get_value(idx, &value, NULL, NULL); > > This also seems to work because it is different if I change the first > opal_setenv() during initialize(). The problem I am seeing is that the > second opal_setenv() (back to 0) cannot be detected using > mca_base_var_get_value(). > > So my question is: what is the preferred way to read and write MCA > variables to access them in the different modules? Is the existing > code still correct? There is also mca_base_var_set_value() should I rather > use this to set 'opal_crs_base_do_not_select'. I was, however, not able > to use mca_base_var_set_value() without a segfault. There are not much > uses of mca_base_var_set_value() in the existing code and none uses > a bool variable. > > I also discovered I can just access to global C variable > 'opal_crs_base_do_not_select' > from opal-restart.c as well as from opal_crs_base_select(). This also works. > This would solve my problem setting and reading MCA variables. > > Adrian