Hmmm...well, the way that function -used- to work was it returned an error
code, and had the index as a *int param in the function call. Tim P changed
it awhile back (don't remember exactly why, but it was when he moved the
pointer_array code from orte to opal), and I'm not sure the fixes it
required were ever propagated everywhere (I occasionally run across them in
ORTE, though I think I've got them all now).

My point: only real fix may be to go back to the old API and quit
overloading the return code.


On Tue, May 18, 2010 at 12:32 PM, Rolf vandeVaart <
rolf.vandeva...@oracle.com> wrote:

>  I think we are almost saying the same thing. But to be sure, I will
> restate. The call to opal_pointer_array_add() can return either an index
> (which I assume is a positive integer, maybe also 0?) or
> OPAL_ERR_OUT_OF_RESOURCE (which is a -2) if it cannot malloc anymore space
> in the table.  So, I guess I agree that the original code was wrong as it
> never would have detected the error since OMPI_ERROR !=
> OPAL_ERR_OUT_OF_RESOURCE.  (-1 != -2)
>
> Since we are overloading the return value, it seems like the only thing we
> could do is something like this:
>
> if (new_group->grp_f_to_c_index < 0)
>    error();
>
> But that does not follow the SOS logic as the key is that we want to
> compare to OMPI_SUCCESS (I think).  Also, for what it is worth, the setting
> of the grp_f_to_c_index happens in the group constructor, so we cannot get
> at the return value of opal_pointer_array_add() except by looking at the
> grp_f_to_c value after the object is constructed.  I am not sure the correct
> way to handle this.
>
> Rolf
>
> On 05/18/10 14:02, Jeff Squyres wrote:
>
> Looks like the comparison to OMPI_ERROR worked by accident -- just because it 
> happened to have a value of -1.
>
> The *_f_to_c_index values are the return value from a call to 
> opal_point_array_add().  This value will either be non-negative or -1.  -1 
> indicates a failure.  It's not an *_ERR_* code -- it's a -1 index value.  So 
> the comparisons should really have been to -1 in the first place.
>
> Rolf / Abhishek -- can you fix?  Did this happen in other *_f_to_c_index 
> member variable comparisons elsewhere?
>
>
>
> On May 18, 2010, at 1:25 PM, Rolf vandeVaart wrote:
>
>
>
>  I am getting SEGVs while running the IMB-MPI1 tests.  I believe the
> problem has to do with changes made to the group_init.c file last
> night.  The error occurs in the call to MPI_Comm_split.
>
>  There were 4 changes in the file that look like this:
> OLD:
> if (OMPI_ERROR == new_group->grp_f_to_c_index)
>
> NEW:
> if (OMPI_SUCCESS != new_group->grp_f_to_c_index)
>
> If I change it back, things work.  I understand the idea of changing the
> logic, but maybe that does not apply in this case?    When running with
> np=2, the value of new_group->grp_f_to_c_index is 4, thereby not
> equaling OMPI_SUCCESS and we end up in an error condition which results
> in a null pointer later on.
>
> Am I the only that has run into this?
>
> Rolf
>
>
> _______________________________________________
> devel mailing 
> listdevel@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to