Dominic, I can only recommend you write a small self contained programs that write the data in parallel, and then check from task 0 only that data was written as you expected.
Feel free to take some time reading mpi io tutorials. If you are still struggling with your code, i will try to help you, once i can download and compile it Also, since this does not look like an openmpi bug, i recommend you post this kind of requests to the users mailing list Cheers, Gilles Dominic Kedelty <dkede...@asu.edu> wrote: >I am open to any suggestions to make the code better, especially if the way >it's coded now is wrong. > > >I believe what the MPI_TYPE_INDEXED is trying to do is this... > > >I have a domain of for example 8 hexahedral elements (2x2x2 cell domain) that >has 27 unique connectivity nodes (3x3x3 nodes) > >In this portion of the code it is trying to write the hexa cell labeling and >its connectivity via nodes. and these elements can be spread across nprocs. > > >The potion of the binary file that is being written should have this format > >[id_e1 id_e2 ... id_ne] > >This block of the file has nelems=12 4-byte binary integers. > >n1_e1 n2_e1 ... n8_e1 > >n1_e2 n2_e2 ... n8_e2 > > . . > >n1_e12 n2_e12 ... n8_e12 > >This block of the file has 8.nelems=12 4-byte binary integers. > > >It is not an irregular shape. since I know that I have an array hexa_ that has >[id_e1 id_e2 id_e3 id_e4] on rank 3 and [id_e5 id_e6 id_e7 id_e8] on rank 1... >etc. and for the most part every processor has the same number of elements. >(that is unless I am running on an odd number of processors) > >note: i am using random ranks because I am not sure if rank 0 gets the first >ids. > > >If MPI_Type_contiguous would work better I am open to switching to that. > > >On Tue, Mar 22, 2016 at 11:06 PM, Gilles Gouaillardet <gil...@rist.or.jp> >wrote: > >Dominik, > >with MPI_Type_indexed, array_of_displacements is an int[] >so yes, there is a risk of overflow > >on the other hand, MPI_Type_create_hindexed, array_of_displacements is an >MPI_Aint[] > >note > array_of_displacements > Displacement for each block, in multiples of oldtype >extent for MPI_Type_indexed and bytes for MPI_Type_create_hindexed (array of >integer for MPI_TYPE_INDEXED, array of MPI_Aint for > MPI_TYPE_CREATE_HINDEXED). > > >i do not fully understand what you are trying to achieve ... > >MPI_TYPE_INDEXED( 1024^3, blocklength=(/8 8 8 8 8 ..... 8 8 /), map=(/0, 8, >16, 24, ..... , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr) > >at first glance, this is equivalent to >MPI_Type_contiguous(1024^3, MPI_INTEGER, file_view_hexa_conn, ierr) > >so unless your data has a non regular shape, i recomment you use other MPI >primitives to create your datatype. >This should be much more efficient, and less prone to integer overflow > >Cheers, > >Gilles > > >On 3/23/2016 2:50 PM, Dominic Kedelty wrote: > >Hi Gilles, > >I believe I have found the problem. Initially I thought it may have been an >mpi issue since it was internally within an mpi function. However, now I am >sure that the problem has to do with an overflow of 4-byte signed integers. > >I am dealing with computational domains that have a little more than a billion >cells (1024^3 cells). However, I am still within the limits of the 4 byte >integer. The area where I am running into the problem is here I have shortened >the code, > > ! Fileviews > >integer :: fileview_hexa > >integer :: fileview_hexa_conn > > >integer, dimension(:), allocatable :: blocklength >integer, dimension(:), allocatable :: map > >integer(KIND=8) :: size > >integer(KIND=MPI_OFFSET_KIND) :: disp ! MPI_OFFSET_KIND seems to be 8-bytes > > >allocate(map(ncells_hexa_),blocklength(ncells_hexa_)) >map = hexa_-1 > >hexa_ is a 4-byte array of integers that label local hexa elements at a given >rank. The max this number can be is in my current code (1024^3). and min is 1. > >blocklength = 1 >call >MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_REAL_SP,fileview_hexa,ierr) > >MPI_REAL_SP is used for 4-byte scalar data types that are going to be written >to the file. (i.e. temperature scalar stored at a given hexa cell) > >call MPI_TYPE_COMMIT(fileview_hexa,ierr) >map = map * 8 > >here is where problems arise. The map is being multiplied by 8 because the >hexa cell node connectivity needs to be written. The node numbering that is >being written to the file needs to be 4-bytes, and the max node numbering is >able to be held within the 4-byte signed integer. But since I have to map >8*1024^3 displacements to be written map needs to be integer(kind=8). > >blocklength = 8 >call >MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_INTEGER,fileview_hexa_conn,ierr) > >MPI_TYPE_INDEXED( 1024^3, blocklength=(/8 8 8 8 8 ..... 8 8 /), map=(/0, 8, >16, 24, ..... , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr) > >Would this be a correct way to declare the newdatatype file_view_hexa_conn. in >this call blocklength would be a 4-byte integer array and map would be an >8-byte integer array. To be clear, in the code currently has both map and >blocklength as 4-bytes integer arrays. > >call MPI_TYPE_COMMIT(fileview_hexa_conn,ierr) >deallocate(map,blocklength) > >.... > >disp = disp+84 > >call >MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa,"native",MPI_INFO_NULL,ierr) >call MPI_FILE_WRITE_ALL(iunit,hexa_,ncells_hexa_,MPI_INTEGER,status,ierr) > >I believe this could be wrong as well but the fileview_hexa is being used to >write the 4-byte integer hexa labeling, but as you said MPI_REAL_SP = >MPI_INTEGER = 4-byte may be fine. It has not given any problems thus far. > >disp = disp+4*ncells_hexa >call >MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa_conn,"native",MPI_INFO_NULL,ierr) >size = 8*ncells_hexa_ >call MPI_FILE_WRITE_ALL(iunit,conn_hexa,size,MPI_INTEGER,status,ierr) > > >Hopefully that is enough information about the issue. Now my questions. >Does this implementation look correct.What kind should fileview_hexa and >fileview_hexa_conn be?Is it okay that map and blocklength are different >integer kinds?What does the blocklength parameter specify exactly. I played >with this some and changing the blocklength did not seem to change anything > >Thanks for the help. > >-Dominic Kedelty > > >On Wed, Mar 16, 2016 at 12:02 AM, Gilles Gouaillardet <gil...@rist.or.jp> >wrote: > >Dominic, > >at first, you might try to add >call MPI_Barrier(comm,ierr) >between > > if (file_is_there .and. irank.eq.iroot) call >MPI_FILE_DELETE(file,MPI_INFO_NULL,ierr) > >and > > call >MPI_FILE_OPEN(comm,file,IOR(MPI_MODE_WRONLY,MPI_MODE_CREATE),MPI_INFO_NULL,iunit,ierr) > >/* there might me a race condition, i am not sure about that */ > > >fwiw, the > >STOP A configuration file is required > >error message is not coming from OpenMPI. >it might be indirectly triggered by an ompio bug/limitation, or even a bug in >your application. > >did you get your application work with an other flavor of OpenMPI ? >e.g. are you reporting an OpenMPI bug ? >or are you asking some help with your application (the bug could either be in >your code or in OpenMPI, and you do not know for sure) > >i am a bit surprised you are using the same fileview_node type with both >MPI_INTEGER and MPI_REAL_SP, but since they should be the same size, that >might not be an issue. > >the subroutine depends on too many external parameters >(nnodes_, fileview_node, ncells_hexa, ncells_hexa_, unstr2str, ...) >so writing a simple reproducer might not be trivial. > >i recommend you first write a self contained program that can be evidenced to >reproduce the issue, >and then i will investigate that. for that, you might want to dump the array >sizes and the description of fileview_node in your application, and then hard >code them into your self contained program. >also how many nodes/tasks are you running and what filesystem are you running >on ? > >Cheers, > >Gilles > > > >On 3/16/2016 3:05 PM, Dominic Kedelty wrote: > >Gilles, > > >I do not have the latest mpich available. I tested using openmpi version 1.8.7 >as well as mvapich2 version 1.9. both produced similar errors. I tried the mca >flag that you had provided and it is telling me that a configuration file is >needed. > > >all processes return: > >STOP A configuration file is required > >I am attaching the subroutine of the code that I believe is where the problem >is occuring. > > > >On Mon, Mar 14, 2016 at 6:25 PM, Gilles Gouaillardet ><gilles.gouaillar...@gmail.com> wrote: > >Dominic, > > >this is a ROMIO error message, and ROMIO is from MPICH project. > >at first, I recommend you try the same test with the latest mpich, in order to >check > >whether the bug is indeed from romio, and has been fixed in the latest release. > >(ompi is a few version behind the latest romio) > > >would you be able to post a trimmed version of your application that evidences >the test ? > >that will be helpful to understand what is going on. > > >you might also want to give a try to > >mpirun --mca io ompio ... > >and see whether this helps. > >that being said, I think ompio is not considered as production ready on the >v1.10 series of ompi > > >Cheers, > > >Gilles > > > >On Tuesday, March 15, 2016, Dominic Kedelty <dkede...@asu.edu> wrote: > >I am getting the following error using openmpi and I am wondering if anyone >would have clue as to why it is happening. It is an error coming from openmpi. > >Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes (40) >fd_size=213909504 off=8617247540 >Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes (40) >fd_size=213909504 off=8617247540 >application called MPI_Abort(MPI_COMM_WORLD, 1) - process 157 >application called MPI_Abort(MPI_COMM_WORLD, 1) - process 477 > >Any help would be appreciated. Thanks. > > >_______________________________________________ >devel mailing list >de...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: >http://www.open-mpi.org/community/lists/devel/2016/03/18697.php > > > > >_______________________________________________ devel mailing list >de...@open-mpi.org Subscription: >http://www.open-mpi.org/mailman/listinfo.cgi/devel > >Link to this post: >http://www.open-mpi.org/community/lists/devel/2016/03/18700.php > > > >_______________________________________________ >devel mailing list >de...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: >http://www.open-mpi.org/community/lists/devel/2016/03/18701.php > > > > >_______________________________________________ devel mailing list >de...@open-mpi.org Subscription: >http://www.open-mpi.org/mailman/listinfo.cgi/devel > >Link to this post: >http://www.open-mpi.org/community/lists/devel/2016/03/18719.php > > > >_______________________________________________ >devel mailing list >de...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: >http://www.open-mpi.org/community/lists/devel/2016/03/18720.php > >