I have another question about checkpoint/restart of Open MPI.

The source file : ompi/runtime/ompi_cr.c
The function name : notify_collectives

In notify_collectives function, it seems to find modules and call ft_event 
functions per communicators
using the for statement.
A variable "modules" used in the for statement is an array which has 16 
elements.

Source code is as follows:

#define NUM_COLLECTIVES 16

#define SIGNAL(comm, modules, highest_module, msg, ret, func)   \
    do {                                                        \
        bool found = false;                                     \
        int k;                                                  \
        mca_coll_base_module_t *my_module =               \
            comm->c_coll.coll_ ## func ## _module;              \
        if (NULL != my_module) {                                \
            for (k = 0 ; k < highest_module ; ++k) {            \
                if (my_module == modules[k]) found = true;      \
            }                                                   \
            if (!found) {                                       \
                modules[highest_module++] = my_module;          \
                if (NULL != my_module->ft_event) {              \
                    ret = my_module->ft_event(msg);             \
                }                                               \
            }                                                   \
        }                                                       \
    } while (0)

static int
notify_collectives(int msg)
{
    mca_coll_base_module_t *modules[NUM_COLLECTIVES];
    int i, max, ret, highest_module = 0;

    memset(&modules, 0, sizeof(mca_coll_base_module_t*) * NUM_COLLECTIVES);

    max = opal_pointer_array_get_size(&ompi_mpi_communicators);
    for (i = 0 ; i < max ; ++i) {
        ompi_communicator_t *comm =
            (ompi_communicator_t 
*)opal_pointer_array_get_item(&ompi_mpi_communicators, i);
        if (NULL == comm) continue;

        SIGNAL(comm, modules, highest_module, msg, ret, allgather);
        SIGNAL(comm, modules, highest_module, msg, ret, allgatherv);

In the for statement, the subscript of the array "modules" is incremented if 
new module is found in macro named "SIGNAL".

I have two questions about this source.

1. I think variable "highest_module", which is a subscript variable of the 
array "modules",
   should be initialized at every communicator.
   If many communicators are created, does the code attempt to access array 
elements which are
   outside the bounds of the array "modules" declaration?

2. I think it works well if adding initialization of subscript variable 
"highest_module" to the for statement
   even if many communicators are created.
   Is that correct?
   For example:

    for (i = 0 ; i < max ; ++i) {
        ompi_communicator_t *comm =
            (ompi_communicator_t 
*)opal_pointer_array_get_item(&ompi_mpi_communicators, i);

        highest_module = 0;      /* <- add initialization of subscript variable 
"highest_module" */

        if (NULL == comm) continue;

Reply via email to