I am open to any suggestions to make the code better, especially if the way it's coded now is wrong.
I believe what the MPI_TYPE_INDEXED is trying to do is this... I have a domain of for example 8 hexahedral elements (2x2x2 cell domain) that has 27 unique connectivity nodes (3x3x3 nodes) In this portion of the code it is trying to write the hexa cell labeling and its connectivity via nodes. and these elements can be spread across nprocs. The potion of the binary file that is being written should have this format [id_e1 id_e2 ... id_ne] This block of the file has nelems=12 4-byte binary integers. n1_e1 n2_e1 ... n8_e1 n1_e2 n2_e2 ... n8_e2 . . n1_e12 n2_e12 ... n8_e12 This block of the file has 8.nelems=12 4-byte binary integers. It is not an irregular shape. since I know that I have an array hexa_ that has [id_e1 id_e2 id_e3 id_e4] on rank 3 and [id_e5 id_e6 id_e7 id_e8] on rank 1... etc. and for the most part every processor has the same number of elements. (that is unless I am running on an odd number of processors) note: i am using random ranks because I am not sure if rank 0 gets the first ids. If MPI_Type_contiguous would work better I am open to switching to that. On Tue, Mar 22, 2016 at 11:06 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Dominik, > > with MPI_Type_indexed, array_of_displacements is an int[] > so yes, there is a risk of overflow > > on the other hand, MPI_Type_create_hindexed, array_of_displacements is an > MPI_Aint[] > > note > array_of_displacements > Displacement for each block, in multiples of > oldtype extent for MPI_Type_indexed and bytes for MPI_Type_create_hindexed > (array of integer for MPI_TYPE_INDEXED, array of MPI_Aint for > MPI_TYPE_CREATE_HINDEXED). > > > i do not fully understand what you are trying to achieve ... > > MPI_TYPE_INDEXED( 1024^3, blocklength=(/8 8 8 8 8 ..... 8 8 /), map=(/0, > 8, 16, 24, ..... , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr) > > at first glance, this is equivalent to > MPI_Type_contiguous(1024^3, MPI_INTEGER, file_view_hexa_conn, ierr) > > so unless your data has a non regular shape, i recomment you use other MPI > primitives to create your datatype. > This should be much more efficient, and less prone to integer overflow > > Cheers, > > Gilles > > On 3/23/2016 2:50 PM, Dominic Kedelty wrote: > > Hi Gilles, > > I believe I have found the problem. Initially I thought it may have been > an mpi issue since it was internally within an mpi function. However, now I > am sure that the problem has to do with an overflow of 4-byte signed > integers. > > I am dealing with computational domains that have a little more than a > billion cells (1024^3 cells). However, I am still within the limits of the > 4 byte integer. The area where I am running into the problem is here I have > shortened the code, > > ! Fileviews > integer :: fileview_hexa > integer :: fileview_hexa_conn > > integer, dimension(:), allocatable :: blocklength > integer, dimension(:), allocatable :: map > integer(KIND=8) :: size > integer(KIND=MPI_OFFSET_KIND) :: disp ! MPI_OFFSET_KIND seems to be > 8-bytes > > allocate(map(ncells_hexa_),blocklength(ncells_hexa_)) > map = hexa_-1 > hexa_ is a 4-byte array of integers that label local hexa elements at a > given rank. The max this number can be is in my current code (1024^3). and > min is 1. > blocklength = 1 > call > MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_REAL_SP,fileview_hexa,ierr) > MPI_REAL_SP is used for 4-byte scalar data types that are going to be > written to the file. (i.e. temperature scalar stored at a given hexa cell) > call MPI_TYPE_COMMIT(fileview_hexa,ierr) > map = map * 8 > here is where problems arise. The map is being multiplied by 8 because the > hexa cell node connectivity needs to be written. The node numbering that is > being written to the file needs to be 4-bytes, and the max node numbering > is able to be held within the 4-byte signed integer. But since I have to > map 8*1024^3 displacements to be written map needs to be integer(kind=8). > blocklength = 8 > call > MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_INTEGER,fileview_hexa_conn,ierr) > MPI_TYPE_INDEXED( 1024^3, blocklength=(/8 8 8 8 8 ..... 8 8 /), map=(/0, > 8, 16, 24, ..... , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr) > Would this be a correct way to declare the newdatatype > file_view_hexa_conn. in this call blocklength would be a 4-byte integer > array and map would be an 8-byte integer array. To be clear, in the code > currently has both map and blocklength as 4-bytes integer arrays. > call MPI_TYPE_COMMIT(fileview_hexa_conn,ierr) > deallocate(map,blocklength) > > .... > > disp = disp+84 > call > MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa,"native",MPI_INFO_NULL,ierr) > call MPI_FILE_WRITE_ALL(iunit,hexa_,ncells_hexa_,MPI_INTEGER,status,ierr) > I believe this could be wrong as well but the fileview_hexa is being used > to write the 4-byte integer hexa labeling, but as you said MPI_REAL_SP = > MPI_INTEGER = 4-byte may be fine. It has not given any problems thus far. > disp = disp+4*ncells_hexa > call > MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa_conn,"native",MPI_INFO_NULL,ierr) > size = 8*ncells_hexa_ > call MPI_FILE_WRITE_ALL(iunit,conn_hexa,size,MPI_INTEGER,status,ierr) > > > Hopefully that is enough information about the issue. Now my questions. > > 1. Does this implementation look correct. > 2. What kind should fileview_hexa and fileview_hexa_conn be? > 3. Is it okay that map and blocklength are different integer kinds? > 4. What does the blocklength parameter specify exactly. I played with > this some and changing the blocklength did not seem to change anything > > Thanks for the help. > > -Dominic Kedelty > > On Wed, Mar 16, 2016 at 12:02 AM, Gilles Gouaillardet < > <gil...@rist.or.jp>gil...@rist.or.jp> wrote: > >> Dominic, >> >> at first, you might try to add >> call MPI_Barrier(comm,ierr) >> between >> >> if (file_is_there .and. irank.eq.iroot) call >> MPI_FILE_DELETE(file,MPI_INFO_NULL,ierr) >> >> and >> >> call >> MPI_FILE_OPEN(comm,file,IOR(MPI_MODE_WRONLY,MPI_MODE_CREATE),MPI_INFO_NULL,iunit,ierr) >> >> /* there might me a race condition, i am not sure about that */ >> >> >> fwiw, the >> >> STOP A configuration file is required >> >> error message is not coming from OpenMPI. >> it might be indirectly triggered by an ompio bug/limitation, or even a >> bug in your application. >> did you get your application work with an other flavor of OpenMPI ? >> e.g. are you reporting an OpenMPI bug ? >> or are you asking some help with your application (the bug could either >> be in your code or in OpenMPI, and you do not know for sure) >> >> i am a bit surprised you are using the same fileview_node type with both >> MPI_INTEGER and MPI_REAL_SP, but since they should be the same size, that >> might not be an issue. >> >> the subroutine depends on too many external parameters >> (nnodes_, fileview_node, ncells_hexa, ncells_hexa_, unstr2str, ...) >> so writing a simple reproducer might not be trivial. >> >> i recommend you first write a self contained program that can be >> evidenced to reproduce the issue, >> and then i will investigate that. for that, you might want to dump the >> array sizes and the description of fileview_node in your application, and >> then hard code them into your self contained program. >> also how many nodes/tasks are you running and what filesystem are you >> running on ? >> >> Cheers, >> >> Gilles >> >> >> On 3/16/2016 3:05 PM, Dominic Kedelty wrote: >> >> Gilles, >> >> I do not have the latest mpich available. I tested using openmpi version >> 1.8.7 as well as mvapich2 version 1.9. both produced similar errors. I >> tried the mca flag that you had provided and it is telling me that a >> configuration file is needed. >> >> all processes return: >> >> STOP A configuration file is required >> >> I am attaching the subroutine of the code that I believe is where the >> problem is occuring. >> >> >> >> On Mon, Mar 14, 2016 at 6:25 PM, Gilles Gouaillardet < >> <gilles.gouaillar...@gmail.com>gilles.gouaillar...@gmail.com> wrote: >> >>> Dominic, >>> >>> this is a ROMIO error message, and ROMIO is from MPICH project. >>> at first, I recommend you try the same test with the latest mpich, in >>> order to check >>> whether the bug is indeed from romio, and has been fixed in the latest >>> release. >>> (ompi is a few version behind the latest romio) >>> >>> would you be able to post a trimmed version of your application that >>> evidences the test ? >>> that will be helpful to understand what is going on. >>> >>> you might also want to give a try to >>> mpirun --mca io ompio ... >>> and see whether this helps. >>> that being said, I think ompio is not considered as production ready on >>> the v1.10 series of ompi >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On Tuesday, March 15, 2016, Dominic Kedelty <dkede...@asu.edu> wrote: >>> >>>> I am getting the following error using openmpi and I am wondering if >>>> anyone would have clue as to why it is happening. It is an error coming >>>> from openmpi. >>>> >>>> Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes >>>> (40) fd_size=213909504 off=8617247540 >>>> Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes >>>> (40) fd_size=213909504 off=8617247540 >>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 157 >>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 477 >>>> >>>> Any help would be appreciated. Thanks. >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2016/03/18697.php >>> >> >> >> >> _______________________________________________ >> devel mailing listde...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/03/18700.php >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/03/18701.php >> > > > > _______________________________________________ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/03/18719.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/03/18720.php >