Hi,
I think I have found a buffer overrun in a function
called by MPI::Init, though explanations of why I am
wrong are welcome.
I am using the openmpi included in Ubuntu Hardy,
version 1.2.5, though I have inspected the latest trunk by eye
and I don't believe the relevant code has changed.
I was trying to use Electric Fence, a memory debugging library,
to debug a suspected buffer overrun in my own program.
Electric Fence works by replacing malloc/free in such
a way that bounds violation errors issue a segfault.
While running my program under Electric Fence, I found
that I got a segfault issued at:
0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at
class/opal_free_list.c:113
113 OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
(gdb) bt
#0 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1)
at class/opal_free_list.c:113
#1 0xb5cdd479 in opal_free_list_init (flist=0xb2b46a50, elem_size=56,
elem_class=0xb2b46e20, num_elements_to_alloc=73,
max_elements_to_alloc=-1, num_elements_per_alloc=1) at
class/opal_free_list.c:78
#2 0xb2b381aa in ompi_osc_pt2pt_component_init
(enable_progress_threads=false, enable_mpi_threads=false) at
osc_pt2pt_component.c:173
#3 0xb792b67c in ompi_osc_base_find_available
(enable_progress_threads=false, enable_mpi_threads=false) at
base/osc_base_open.c:84
#4 0xb78e6abe in ompi_mpi_init (argc=5, argv=0xbfd61f84, requested=0,
provided=0xbfd61e78) at runtime/ompi_mpi_init.c:411
#5 0xb7911a87 in PMPI_Init (argc=0xbfd61f00, argv=0xbfd61f04) at pinit.c:71
#6 0x0811ca6c in MPI::Init ()
#7 0x08118b8a in main ()
To investigate further, I replaced the OBJ_CONSTRUCT_INTERNAL
macro with its definition in opal/class/opal_object.h, and ran it again.
It appears that the invalid memory access is happening
on the instruction
((opal_object_t *) (item))->obj_class = (flist->fl_elem_class);
Investigating further, I modified the source to opal_free_list
with the attached patch. It adds a few debugging printfs to
diagnose exactly what the code is doing. The output of the debugging
statements are:
mpidebug: allocating 216
mpidebug: allocated at memory address 0xb62bdf28
mpidebug: accessing address 0xb62be000
[segfault]
Now, 0xb62be000 - 0xb62bdf28 = 216, which is
the size of the buffer allocated, and so I think
this is a buffer overrun.
Steps to reproduce:
a) Install Electric Fence
b) Compile the following program
#include <stdlib.h>
#include <unistd.h>
#include <mpi.h>
int main(int argc, char **argv)
{
MPI::Init(argc, argv);
MPI::Finalize();
return 0;
}
with
mpiCC -o test ./test.cpp
c) gdb ./test
d) set environment LD_PRELOAD /usr/lib/libefence.so.0.0
e) run
Hope this helps,
Patrick Farrell
--
Patrick Farrell
PhD student
Imperial College London
--- openmpi-1.2.5/opal/class/opal_free_list.c 2008-08-23 18:35:03.000000000 +0100
+++ openmpi-1.2.5-modified/opal/class/opal_free_list.c 2008-08-23 18:31:47.000000000 +0100
@@ -90,9 +90,12 @@
if (flist->fl_max_to_alloc > 0 && flist->fl_num_allocated + num_elements > flist->fl_max_to_alloc)
return OPAL_ERR_TEMP_OUT_OF_RESOURCE;
+ fprintf(stderr, "mpidebug: allocating %d\n", (num_elements * flist->fl_elem_size) + sizeof(opal_list_item_t) + CACHE_LINE_SIZE);
alloc_ptr = (unsigned char *)malloc((num_elements * flist->fl_elem_size) +
sizeof(opal_list_item_t) +
CACHE_LINE_SIZE);
+ fprintf(stderr, "mpidebug: allocated at memory address %p\n", alloc_ptr);
+
if(NULL == alloc_ptr)
return OPAL_ERR_TEMP_OUT_OF_RESOURCE;
@@ -110,7 +113,16 @@
for(i=0; i<num_elements; i++) {
opal_free_list_item_t* item = (opal_free_list_item_t*)ptr;
if (NULL != flist->fl_elem_class) {
- OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
+ do {
+ if (0 == (flist->fl_elem_class)->cls_initialized) {
+ opal_class_initialize((flist->fl_elem_class));
+ }
+ fprintf(stderr, "mpidebug: accessing address %p\n", &((opal_object_t *) (item))->obj_class);
+ ((opal_object_t *) (item))->obj_class = (flist->fl_elem_class);
+ fprintf(stderr, "mpidebug: accessing address %p\n", &((opal_object_t *) (item))->obj_reference_count);
+ ((opal_object_t *) (item))->obj_reference_count = 1;
+ opal_obj_run_constructors((opal_object_t *) (item));
+ } while (0);
}
opal_list_append(&(flist->super), &(item->super));
ptr += flist->fl_elem_size;
@@ -119,5 +131,3 @@
return OPAL_SUCCESS;
}
-
-