Eric,

Thanks for the report. I used your example to replicate the issue and I confirm 
it appears in all versions in debug mode. However, the assert in the convertor 
code is correct and your code as well. The issue is more complex, and it is 
triggered by a usage of the convertor which should have been prevented.

If I'm not mistaken, Edgar (CC'ed on this email) is the maintainer of that 
particular code path. Hopefully, he will be able to fix the code based on the 
following analysis.

The underlying issue is that when the convertor is created with no data to 
convert, it is automatically marked as COMPLETED. Once in this state, no 
further conversion calls should be made, or they will trigger the issue you 
encountered. Unfortunately, the code in the OMPIIO doesn't check if there is 
more data to handle before going into the opal_convertor_raw function (function 
which as I said above is not supposed to be called on a completed convertor). 
The function ompi_io_ompio_decode_datatype, assume that there is at least one 
segment in the file, fact that explain the call to opal_convertor_raw. 

I modified the ompi_convertor_raw to accept he case where the convertor is 
already completed and return the same value as opal_convertor_pack/unpack 
(r28305), so now we have a consistent interface for the convertor. However, 
this lead to a division with zero in the OMPIIO layer as the number of iovecs 
returned by opal_convertor_raw is now zero, and this is not handled. I hope 
Edgar will be able to fix that part.

  George.


On Apr 5, 2013, at 23:10 , Eric Chamberland <eric.chamberl...@giref.ulaval.ca> 
wrote:

> Hi all,
> 
> (Sorry, I have sent this to "users" but I should have sent it to "devel" list 
> instead.  Sorry for the mess...)
> 
> I have attached a very small example which raise an assertion.
> 
> The problem is arising from a process which does not have any element to 
> write in a file (and then in the MPI_File_set_view)...
> 
> You can see this "bug" with openmpi 1.6.3, 1.6.4 and 1.7.0 configured with:
> 
> ./configure --enable-mem-debug --enable-mem-profile --enable-memchecker
> --with-mpi-param-check --enable-debug
> 
> Just compile the given example (idx_null.cc) as-is with
> 
> mpicxx -o idx_null idx_null.cc
> 
> and run with 3 processes:
> 
> mpirun -n 3 idx_null
> 
> You can modify the example by commenting "#define WITH_ZERO_ELEMNT_BUG" to 
> see that everything is going well when all processes have something to write.
> 
> There is no "bug" if you use openmpi 1.6.3 (and higher) without the debugging 
> options.
> 
> Also, all is working well with mpich-3.0.3 configured with:
> 
> ./configure --enable-g=yes
> 
> 
> So, is this a wrong "assert" in openmpi?
> 
> Is there a real problem to use this example in a "release" mode?
> 
> Thanks,
> 
> Eric
> <idx_null.cc>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to