bzero is not a gnu-ism -- it's in POSIX.1. Either bzero or memset is
correct and used throughout OMPI.
Brian
On Thu, 21 Aug 2008, Jeff Squyres wrote:
IIRC, bzero is a gnu-ism. We should probably use memset instead.
On Aug 21, 2008, at 5:40 AM, George Bosilca wrote:
Terry,
We use the feature defined by POSIX mmap where the area should be
zero-filled when the file length is extended. What OS you're using when you
see such problems ?
Just in case, here is a patch that set the beginning of the mmaped region
to zero, in case this is not done automatically. As in most cases this is
an unnecessary overhead, we should find the cases where we really need
this, and possibly conditionally compile it.
Index: ompi/mca/common/sm/common_sm_mmap.c
===================================================================
--- ompi/mca/common/sm/common_sm_mmap.c (revision 19377)
+++ ompi/mca/common/sm/common_sm_mmap.c (working copy)
@@ -163,6 +163,7 @@
/* initialize the segment - only the first process
to open the file */
+ bzero( map->data_addr, size );
mem_offset = map->data_addr - (unsigned char *)map->map_seg;
map->map_seg->seg_offset = mem_offset;
map->map_seg->seg_size = size - mem_offset;
george.
On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:
I've been seeing an intermittent (once every 4 hours looping on a quick
initialization program) segv with the following stack trace.
=>[1] mca_btl_sm_add_procs(btl = 0xfffffd7ffdb67ef0, nprocs = 2U, procs =
0x591560, peers = 0x591580, reachability = 0xfffffd7fffdff000), line 519
in "btl_sm.c"
[2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560, bml_endpoints =
0x591500, reachable = 0xfffffd7fffdff000), line 222 in "bml_r2.c"
[3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248 in
"pml_ob1.c"
[4] ompi_mpi_init(argc = 1, argv = 0xfffffd7fffdff318, requested = 0,
provided = 0xfffffd7fffdff234), line 651 in "ompi_mpi_init.c"
[5] PMPI_Init(argc = 0xfffffd7fffdff2ec, argv = 0xfffffd7fffdff2e0), line
90 in "pinit.c"
[6] main(argc = 1, argv = 0xfffffd7fffdff318), line 82 in "buffer.c"
I believe the problem is that mca_btl_sm_component.shm_fifo[j] contains
uninitialized data causes the loop on line 504 in btl_sm.c to think that a
remote rank has set its fifo address.
Has anyone else seen the above happening?
--td
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel