I don't believe we support changing the value of an MCA param on-the-fly - you'd need to transfer it to an appropriate-level global that you can change as required
On Mar 14, 2014, at 2:05 PM, Adrian Reber <adr...@lisas.de> wrote: > I am now trying to run orte-restart. As far as I understand it > orte-restart analyzes the checkpoint metadata and then tries to exec() > mpirun which then starts opal-restart. During the startup of > opal-restart (during initialize()) detection of the best CRS module is > disabled: > > /* > * Turn off the selection of the CRS component, > * we need to do that later > */ > (void) mca_base_var_env_name("crs_base_do_not_select", &tmp_env_var); > opal_setenv(tmp_env_var, > "1", /* turn off the selection */ > true, &environ); > free(tmp_env_var); > tmp_env_var = NULL; > > This seems to work. Later when actually selecting the correct CRS module > to restart the checkpointed process the selection is enabled again: > > /* Re-enable the selection of the CRS component, so we can choose the > right one */ > (void) mca_base_var_env_name("crs_base_do_not_select", &tmp_env_var); > opal_setenv(tmp_env_var, > "0", /* turn on the selection */ > true, &environ); > free(tmp_env_var); > tmp_env_var = NULL; > > This does not seem to have an effect. The one reason why it does not work > is pretty obvious. The mca variable crs_base_do_not_select is registered > during > opal_crs_base_register() and written to the bool variable > opal_crs_base_do_not_select > only once (during register). Later in opal_crs_base_select() this bool > variable is queried if select should run or not and as it is only changed > during register it never changes. So from the code flow it cannot work > and is probably the result of one of the rewrites since C/R was introduced. > > To fix this I am trying to read the value of the MCA variable > opal_crs_base_do_not_select during opal_crs_base_select() like this: > > idx = mca_base_var_find("opal", "crs", "base", "do_not_select") > mca_base_var_get_value(idx, &value, NULL, NULL); > > This also seems to work because it is different if I change the first > opal_setenv() during initialize(). The problem I am seeing is that the > second opal_setenv() (back to 0) cannot be detected using > mca_base_var_get_value(). > > So my question is: what is the preferred way to read and write MCA > variables to access them in the different modules? Is the existing > code still correct? There is also mca_base_var_set_value() should I rather > use this to set 'opal_crs_base_do_not_select'. I was, however, not able > to use mca_base_var_set_value() without a segfault. There are not much > uses of mca_base_var_set_value() in the existing code and none uses > a bool variable. > > I also discovered I can just access to global C variable > 'opal_crs_base_do_not_select' > from opal-restart.c as well as from opal_crs_base_select(). This also works. > This would solve my problem setting and reading MCA variables. > > Adrian > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14347.php