Dominic,

I can only recommend you write a small self contained programs that write the 
data in parallel, and then check from task 0 only that data was written as you 
expected.

Feel free to take some time reading mpi io tutorials.

If you are still struggling with your code, i will try to help you,
once i can download and compile it

Also, since this does not look like an openmpi bug, i recommend you post this 
kind of requests to the users mailing list

Cheers,

Gilles

Dominic Kedelty <dkede...@asu.edu> wrote:
>I am open to any suggestions to make the code better, especially if the way 
>it's coded now is wrong.
>
>
>I believe what the MPI_TYPE_INDEXED is trying to do is this...
>
>
>I have a domain of for example 8 hexahedral elements (2x2x2 cell domain) that 
>has 27 unique connectivity nodes (3x3x3 nodes)
>
>In this portion of the code it is trying to write the hexa cell labeling and 
>its connectivity via nodes. and these elements can be spread across nprocs.
>
>
>The potion of the binary file that is being written should have this format
>
>[id_e1 id_e2 ... id_ne]
>
>This block of the file has nelems=12 4-byte binary integers. 
>
>n1_e1 n2_e1 ... n8_e1 
>
>n1_e2 n2_e2 ... n8_e2
>
> . . 
>
>n1_e12 n2_e12 ... n8_e12
>
>This block of the file has 8.nelems=12 4-byte binary integers.
>
>
>It is not an irregular shape. since I know that I have an array hexa_ that has 
>[id_e1 id_e2 id_e3 id_e4] on rank 3 and [id_e5 id_e6 id_e7 id_e8] on rank 1... 
>etc. and for the most part every processor has the same number of elements. 
>(that is unless I am running on an odd number of processors)
>
>note: i am using random ranks because I am not sure if rank 0 gets the first 
>ids.
>
>
>If MPI_Type_contiguous would work better I am open to switching to that.
>
>
>On Tue, Mar 22, 2016 at 11:06 PM, Gilles Gouaillardet <gil...@rist.or.jp> 
>wrote:
>
>Dominik,
>
>with MPI_Type_indexed, array_of_displacements is an int[]
>so yes, there is a risk of overflow
>
>on the other hand, MPI_Type_create_hindexed, array_of_displacements is an 
>MPI_Aint[]
>
>note
> array_of_displacements
>                 Displacement  for  each  block,  in  multiples  of  oldtype 
>extent for MPI_Type_indexed and bytes for MPI_Type_create_hindexed (array of 
>integer for MPI_TYPE_INDEXED, array of MPI_Aint for
>                 MPI_TYPE_CREATE_HINDEXED).
>
>
>i do not fully understand what you are trying to achieve ...
>
>MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 ..... 8 8 /), map=(/0, 8, 
>16, 24, ..... , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr)
>
>at first glance, this is equivalent to
>MPI_Type_contiguous(1024^3, MPI_INTEGER, file_view_hexa_conn, ierr)
>
>so unless your data has a non regular shape, i recomment you use other MPI 
>primitives to create your datatype.
>This should be much more efficient, and less prone to integer overflow
>
>Cheers,
>
>Gilles
>
>
>On 3/23/2016 2:50 PM, Dominic Kedelty wrote:
>
>Hi Gilles,
>
>I believe I have found the problem. Initially I thought it may have been an 
>mpi issue since it was internally within an mpi function. However, now I am 
>sure that the problem has to do with an overflow of 4-byte signed integers.
>
>I am dealing with computational domains that have a little more than a billion 
>cells (1024^3 cells). However, I am still within the limits of the 4 byte 
>integer. The area where I am running into the problem is here I have shortened 
>the code,
>
> ! Fileviews
>
>integer :: fileview_hexa
>
>integer :: fileview_hexa_conn
>
>
>integer, dimension(:), allocatable :: blocklength
>integer, dimension(:), allocatable :: map
>
>integer(KIND=8) :: size
>
>integer(KIND=MPI_OFFSET_KIND) :: disp   ! MPI_OFFSET_KIND seems to be 8-bytes
>
>
>allocate(map(ncells_hexa_),blocklength(ncells_hexa_))
>map = hexa_-1
>
>hexa_ is a 4-byte array of integers that label local hexa elements at a given 
>rank. The max this number can be is in my current code (1024^3). and min is 1. 
>
>blocklength = 1
>call 
>MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_REAL_SP,fileview_hexa,ierr)
>
>MPI_REAL_SP is used for 4-byte scalar data types that are going to be written 
>to the file. (i.e. temperature scalar stored at a given hexa cell) 
>
>call MPI_TYPE_COMMIT(fileview_hexa,ierr)
>map = map * 8
>
>here is where problems arise. The map is being multiplied by 8 because the 
>hexa cell node connectivity needs to be written. The node numbering that is 
>being written to the file needs to be 4-bytes, and the max node numbering is 
>able to be held within the 4-byte signed integer. But since I have to map 
>8*1024^3 displacements to be written map needs to be integer(kind=8).
>
>blocklength = 8
>call 
>MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_INTEGER,fileview_hexa_conn,ierr)
>
>MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 ..... 8 8 /), map=(/0, 8, 
>16, 24, ..... , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr)
>
>Would this be a correct way to declare the newdatatype file_view_hexa_conn. in 
>this call blocklength would be a 4-byte integer array and map would be an 
>8-byte integer array. To be clear, in the code currently has both map and 
>blocklength as 4-bytes integer arrays.  
>
>call MPI_TYPE_COMMIT(fileview_hexa_conn,ierr)
>deallocate(map,blocklength)
>
>....
>
>disp = disp+84
>
>call 
>MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa,"native",MPI_INFO_NULL,ierr)
>call MPI_FILE_WRITE_ALL(iunit,hexa_,ncells_hexa_,MPI_INTEGER,status,ierr)
>
>I believe this could be wrong as well but the fileview_hexa is being used to 
>write the 4-byte integer hexa labeling, but as you said MPI_REAL_SP = 
>MPI_INTEGER = 4-byte may be fine. It has not given any problems thus far. 
>
>disp = disp+4*ncells_hexa
>call 
>MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa_conn,"native",MPI_INFO_NULL,ierr)
>size = 8*ncells_hexa_
>call MPI_FILE_WRITE_ALL(iunit,conn_hexa,size,MPI_INTEGER,status,ierr)
>
>
>Hopefully that is enough information about the issue. Now my questions. 
>Does this implementation look correct.What kind should fileview_hexa and 
>fileview_hexa_conn be?Is it okay that map and blocklength are different 
>integer kinds?What does the blocklength parameter specify exactly. I played 
>with this some and changing the blocklength did not seem to change anything 
>
>Thanks for the help. 
>
>-Dominic Kedelty
>
>
>On Wed, Mar 16, 2016 at 12:02 AM, Gilles Gouaillardet <gil...@rist.or.jp> 
>wrote:
>
>Dominic,
>
>at first, you might try to add
>call MPI_Barrier(comm,ierr)
>between
>
>  if (file_is_there .and. irank.eq.iroot) call 
>MPI_FILE_DELETE(file,MPI_INFO_NULL,ierr)
>
>and
>
>  call 
>MPI_FILE_OPEN(comm,file,IOR(MPI_MODE_WRONLY,MPI_MODE_CREATE),MPI_INFO_NULL,iunit,ierr)
>
>/* there might me a race condition, i am not sure about that */
>
>
>fwiw, the
>
>STOP A configuration file is required
>
>error message is not coming from OpenMPI.
>it might be indirectly triggered by an ompio bug/limitation, or even a bug in 
>your application.
>
>did you get your application work with an other flavor of OpenMPI ?
>e.g. are you reporting an OpenMPI bug ?
>or are you asking some help with your application (the bug could either be in 
>your code or in OpenMPI, and you do not know for sure)
>
>i am a bit surprised you are using the same fileview_node type with both 
>MPI_INTEGER and MPI_REAL_SP, but since they should be the same size, that 
>might not be an issue.
>
>the subroutine depends on too many external parameters
>(nnodes_, fileview_node, ncells_hexa, ncells_hexa_, unstr2str, ...)
>so writing a simple reproducer might not be trivial.
>
>i recommend you first write a self contained program that can be evidenced to 
>reproduce the issue,
>and then i will investigate that. for that, you might want to dump the array 
>sizes and the description of fileview_node in your application, and then hard 
>code them into your self contained program.
>also how many nodes/tasks are you running and what filesystem are you running 
>on ?
>
>Cheers,
>
>Gilles 
>
>
>
>On 3/16/2016 3:05 PM, Dominic Kedelty wrote:
>
>Gilles, 
>
>
>I do not have the latest mpich available. I tested using openmpi version 1.8.7 
>as well as mvapich2 version 1.9. both produced similar errors. I tried the mca 
>flag that you had provided and it is telling me that a configuration file is 
>needed.
>
>
>all processes return:
>
>STOP A configuration file is required
>
>I am attaching the subroutine of the code that I believe is where the problem 
>is occuring.
>
>
>
>On Mon, Mar 14, 2016 at 6:25 PM, Gilles Gouaillardet 
><gilles.gouaillar...@gmail.com> wrote:
>
>Dominic, 
>
>
>this is a ROMIO error message, and ROMIO is from MPICH project.
>
>at first, I recommend you try the same test with the latest mpich, in order to 
>check
>
>whether the bug is indeed from romio, and has been fixed in the latest release.
>
>(ompi is a few version behind the latest romio)
>
>
>would you be able to post a trimmed version of your application that evidences 
>the test ?
>
>that will be helpful to understand what is going on.
>
>
>you might also want to give a try to
>
>mpirun --mca io ompio ...
>
>and see whether this helps. 
>
>that being said, I think ompio is not considered as production ready on the 
>v1.10 series of ompi
>
>
>Cheers,
>
>
>Gilles 
>
>
>
>On Tuesday, March 15, 2016, Dominic Kedelty <dkede...@asu.edu> wrote:
>
>I am getting the following error using openmpi and I am wondering if anyone 
>would have clue as to why it is happening. It is an error coming from openmpi.
>
>Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes (40) 
>fd_size=213909504 off=8617247540
>Error in ADIOI_Calc_aggregator(): rank_index(40) >= fd->hints->cb_nodes (40) 
>fd_size=213909504 off=8617247540
>application called MPI_Abort(MPI_COMM_WORLD, 1) - process 157
>application called MPI_Abort(MPI_COMM_WORLD, 1) - process 477
>
>Any help would be appreciated. Thanks.
>
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2016/03/18697.php
>
>
>
>
>_______________________________________________ devel mailing list 
>de...@open-mpi.org Subscription: 
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2016/03/18700.php 
>
>
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2016/03/18701.php
>
>
>
>
>_______________________________________________ devel mailing list 
>de...@open-mpi.org Subscription: 
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2016/03/18719.php 
>
>
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2016/03/18720.php
>
>

Reply via email to