Dominik,
with MPI_Type_indexed, array_of_displacements is an int[]
so yes, there is a risk of overflow
on the other hand, MPI_Type_create_hindexed, array_of_displacements is
an MPI_Aint[]
note
array_of_displacements
Displacement for each block, in multiples of
oldtype extent for MPI_Type_indexed and bytes for
MPI_Type_create_hindexed (array of integer for MPI_TYPE_INDEXED, array
of MPI_Aint for
MPI_TYPE_CREATE_HINDEXED).
i do not fully understand what you are trying to achieve ...
MPI_TYPE_INDEXED( 1024^3, blocklength=(/8 8 8 8 8 ..... 8 8 /),
map=(/0, 8, 16, 24, ..... , 8589934592/), MPI_INTEGER,
file_view_hexa_conn, ierr)
at first glance, this is equivalent to
MPI_Type_contiguous(1024^3, MPI_INTEGER, file_view_hexa_conn, ierr)
so unless your data has a non regular shape, i recomment you use other
MPI primitives to create your datatype.
This should be much more efficient, and less prone to integer overflow
Cheers,
Gilles
On 3/23/2016 2:50 PM, Dominic Kedelty wrote:
Hi Gilles,
I believe I have found the problem. Initially I thought it may have
been an mpi issue since it was internally within an mpi function.
However, now I am sure that the problem has to do with an overflow of
4-byte signed integers.
I am dealing with computational domains that have a little more than a
billion cells (1024^3 cells). However, I am still within the limits of
the 4 byte integer. The area where I am running into the problem is
here I have shortened the code,
! Fileviews
integer :: fileview_hexa
integer :: fileview_hexa_conn
integer, dimension(:), allocatable :: blocklength
integer, dimension(:), allocatable :: map
integer(KIND=8) :: size
integer(KIND=MPI_OFFSET_KIND) :: disp ! MPI_OFFSET_KIND seems to be
8-bytes
allocate(map(ncells_hexa_),blocklength(ncells_hexa_))
map = hexa_-1
hexa_ is a 4-byte array of integers that label local hexa elements at
a given rank. The max this number can be is in my current code
(1024^3). and min is 1.
blocklength = 1
call
MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_REAL_SP,fileview_hexa,ierr)
MPI_REAL_SP is used for 4-byte scalar data types that are going to be
written to the file. (i.e. temperature scalar stored at a given hexa
cell)
call MPI_TYPE_COMMIT(fileview_hexa,ierr)
map = map * 8
here is where problems arise. The map is being multiplied by 8 because
the hexa cell node connectivity needs to be written. The node
numbering that is being written to the file needs to be 4-bytes, and
the max node numbering is able to be held within the 4-byte signed
integer. But since I have to map 8*1024^3 displacements to be written
map needs to be integer(kind=8).
blocklength = 8
call
MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_INTEGER,fileview_hexa_conn,ierr)
MPI_TYPE_INDEXED( 1024^3, blocklength=(/8 8 8 8 8 ..... 8 8 /),
map=(/0, 8, 16, 24, ..... , 8589934592/), MPI_INTEGER,
file_view_hexa_conn, ierr)
Would this be a correct way to declare the newdatatype
file_view_hexa_conn. in this call blocklength would be a 4-byte
integer array and map would be an 8-byte integer array. To be clear,
in the code currently has both map and blocklength as 4-bytes integer
arrays.
call MPI_TYPE_COMMIT(fileview_hexa_conn,ierr)
deallocate(map,blocklength)
....
disp = disp+84
call
MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa,"native",MPI_INFO_NULL,ierr)
call MPI_FILE_WRITE_ALL(iunit,hexa_,ncells_hexa_,MPI_INTEGER,status,ierr)
I believe this could be wrong as well but the fileview_hexa is being
used to write the 4-byte integer hexa labeling, but as you said
MPI_REAL_SP = MPI_INTEGER = 4-byte may be fine. It has not given any
problems thus far.
disp = disp+4*ncells_hexa
call
MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa_conn,"native",MPI_INFO_NULL,ierr)
size = 8*ncells_hexa_
call MPI_FILE_WRITE_ALL(iunit,conn_hexa,size,MPI_INTEGER,status,ierr)
Hopefully that is enough information about the issue. Now my questions.
1. Does this implementation look correct.
2. What kind should fileview_hexa and fileview_hexa_conn be?
3. Is it okay that map and blocklength are different integer kinds?
4. What does the blocklength parameter specify exactly. I played with
this some and changing the blocklength did not seem to change anything
Thanks for the help.
-Dominic Kedelty
On Wed, Mar 16, 2016 at 12:02 AM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Dominic,
at first, you might try to add
call MPI_Barrier(comm,ierr)
between
if (file_is_there .and. irank.eq.iroot) call
MPI_FILE_DELETE(file,MPI_INFO_NULL,ierr)
and
call
MPI_FILE_OPEN(comm,file,IOR(MPI_MODE_WRONLY,MPI_MODE_CREATE),MPI_INFO_NULL,iunit,ierr)
/* there might me a race condition, i am not sure about that */
fwiw, the
STOP A configuration file is required
error message is not coming from OpenMPI.
it might be indirectly triggered by an ompio bug/limitation, or
even a bug in your application.
did you get your application work with an other flavor of OpenMPI ?
e.g. are you reporting an OpenMPI bug ?
or are you asking some help with your application (the bug could
either be in your code or in OpenMPI, and you do not know for sure)
i am a bit surprised you are using the same fileview_node type
with both MPI_INTEGER and MPI_REAL_SP, but since they should be
the same size, that might not be an issue.
the subroutine depends on too many external parameters
(nnodes_, fileview_node, ncells_hexa, ncells_hexa_, unstr2str, ...)
so writing a simple reproducer might not be trivial.
i recommend you first write a self contained program that can be
evidenced to reproduce the issue,
and then i will investigate that. for that, you might want to dump
the array sizes and the description of fileview_node in your
application, and then hard code them into your self contained program.
also how many nodes/tasks are you running and what filesystem are
you running on ?
Cheers,
Gilles
On 3/16/2016 3:05 PM, Dominic Kedelty wrote:
Gilles,
I do not have the latest mpich available. I tested using openmpi
version 1.8.7 as well as mvapich2 version 1.9. both produced
similar errors. I tried the mca flag that you had provided and it
is telling me that a configuration file is needed.
all processes return:
STOP A configuration file is required
I am attaching the subroutine of the code that I believe is where
the problem is occuring.
On Mon, Mar 14, 2016 at 6:25 PM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com
<mailto:gilles.gouaillar...@gmail.com>> wrote:
Dominic,
this is a ROMIO error message, and ROMIO is from MPICH project.
at first, I recommend you try the same test with the latest
mpich, in order to check
whether the bug is indeed from romio, and has been fixed in
the latest release.
(ompi is a few version behind the latest romio)
would you be able to post a trimmed version of your
application that evidences the test ?
that will be helpful to understand what is going on.
you might also want to give a try to
mpirun --mca io ompio ...
and see whether this helps.
that being said, I think ompio is not considered as
production ready on the v1.10 series of ompi
Cheers,
Gilles
On Tuesday, March 15, 2016, Dominic Kedelty <dkede...@asu.edu
<mailto:dkede...@asu.edu>> wrote:
I am getting the following error using openmpi and I am
wondering if anyone would have clue as to why it is
happening. It is an error coming from openmpi.
Error in ADIOI_Calc_aggregator(): rank_index(40) >=
fd->hints->cb_nodes (40) fd_size=213909504 off=8617247540
Error in ADIOI_Calc_aggregator(): rank_index(40) >=
fd->hints->cb_nodes (40) fd_size=213909504 off=8617247540
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 157
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 477
Any help would be appreciated. Thanks.
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/03/18697.php
_______________________________________________ devel mailing
list de...@open-mpi.org <mailto:de...@open-mpi.org> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2016/03/18700.php
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/03/18701.php
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/03/18719.php