Patrick,
I'm unable to reproduce the buffer overrun with the latest trunk. I
run valgrind (with the memchekcer tool) on a regular basis on the
trunk, and I never noticed anything like that. Moreover, I went over
the code, and I cannot imagine how we can overrun the buffer in the
code you pinpointed.
Thanks,
george.
On Aug 23, 2008, at 7:57 PM, Patrick Farrell wrote:
> Hi,
>
> I think I have found a buffer overrun in a function
> called by MPI::Init, though explanations of why I am
> wrong are welcome.
>
> I am using the openmpi included in Ubuntu Hardy,
> version 1.2.5, though I have inspected the latest trunk by eye
> and I don't believe the relevant code has changed.
>
> I was trying to use Electric Fence, a memory debugging library,
> to debug a suspected buffer overrun in my own program.
> Electric Fence works by replacing malloc/free in such
> a way that bounds violation errors issue a segfault.
> While running my program under Electric Fence, I found
> that I got a segfault issued at:
>
> 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1)
> at class/opal_free_list.c:113
> 113 OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
> (gdb) bt
> #0 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50,
> num_elements=1) at class/opal_free_list.c:113
> #1 0xb5cdd479 in opal_free_list_init (flist=0xb2b46a50,
> elem_size=56, elem_class=0xb2b46e20, num_elements_to_alloc=73,
> max_elements_to_alloc=-1, num_elements_per_alloc=1) at class/
> opal_free_list.c:78
> #2 0xb2b381aa in ompi_osc_pt2pt_component_init
> (enable_progress_threads=false, enable_mpi_threads=false) at
> osc_pt2pt_component.c:173
> #3 0xb792b67c in ompi_osc_base_find_available
> (enable_progress_threads=false, enable_mpi_threads=false) at base/
> osc_base_open.c:84
> #4 0xb78e6abe in ompi_mpi_init (argc=5, argv=0xbfd61f84,
> requested=0, provided=0xbfd61e78) at runtime/ompi_mpi_init.c:411
> #5 0xb7911a87 in PMPI_Init (argc=0xbfd61f00, argv=0xbfd61f04) at
> pinit.c:71
> #6 0x0811ca6c in MPI::Init ()
> #7 0x08118b8a in main ()
>
> To investigate further, I replaced the OBJ_CONSTRUCT_INTERNAL
> macro with its definition in opal/class/opal_object.h, and ran it
> again.
> It appears that the invalid memory access is happening
> on the instruction
>
> ((opal_object_t *) (item))->obj_class = (flist->fl_elem_class);
>
> Investigating further, I modified the source to opal_free_list
> with the attached patch. It adds a few debugging printfs to
> diagnose exactly what the code is doing. The output of the debugging
> statements are:
>
> mpidebug: allocating 216
> mpidebug: allocated at memory address 0xb62bdf28
> mpidebug: accessing address 0xb62be000
> [segfault]
>
> Now, 0xb62be000 - 0xb62bdf28 = 216, which is
> the size of the buffer allocated, and so I think
> this is a buffer overrun.
>
> Steps to reproduce:
>
> a) Install Electric Fence
> b) Compile the following program
>
> #include <stdlib.h>
> #include <unistd.h>
>
> #include <mpi.h>
>
> int main(int argc, char **argv)
> {
> MPI::Init(argc, argv);
> MPI::Finalize();
>
> return 0;
> }
>
> with
>
> mpiCC -o test ./test.cpp
>
> c) gdb ./test
> d) set environment LD_PRELOAD /usr/lib/libefence.so.0.0
> e) run
>
> Hope this helps,
>
> Patrick Farrell
>
> --
> Patrick Farrell
> PhD student
> Imperial College London
> --- openmpi-1.2.5/opal/class/opal_free_list.c 2008-08-23
> 18:35:03.000000000 +0100
> +++ openmpi-1.2.5-modified/opal/class/opal_free_list.c 2008-08-23
> 18:31:47.000000000 +0100
> @@ -90,9 +90,12 @@
> if (flist->fl_max_to_alloc > 0 && flist->fl_num_allocated +
> num_elements > flist->fl_max_to_alloc)
> return OPAL_ERR_TEMP_OUT_OF_RESOURCE;
>
> + fprintf(stderr, "mpidebug: allocating %d\n", (num_elements *
> flist->fl_elem_size) + sizeof(opal_list_item_t) + CACHE_LINE_SIZE);
> alloc_ptr = (unsigned char *)malloc((num_elements * flist-
> >fl_elem_size) +
> sizeof(opal_list_item_t) +
> CACHE_LINE_SIZE);
> + fprintf(stderr, "mpidebug: allocated at memory address %p\n",
> alloc_ptr);
> +
> if(NULL == alloc_ptr)
> return OPAL_ERR_TEMP_OUT_OF_RESOURCE;
>
> @@ -110,7 +113,16 @@
> for(i=0; i<num_elements; i++) {
> opal_free_list_item_t* item = (opal_free_list_item_t*)ptr;
> if (NULL != flist->fl_elem_class) {
> - OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
> + do {
> + if (0 == (flist->fl_elem_class)->cls_initialized) {
> + opal_class_initialize((flist->fl_elem_class));
> + }
> + fprintf(stderr, "mpidebug: accessing address %p\n",
> &((opal_object_t *) (item))->obj_class);
> + ((opal_object_t *) (item))->obj_class = (flist-
> >fl_elem_class);
> + fprintf(stderr, "mpidebug: accessing address %p\n",
> &((opal_object_t *) (item))->obj_reference_count);
> + ((opal_object_t *) (item))->obj_reference_count = 1;
> + opal_obj_run_constructors((opal_object_t *) (item));
> + } while (0);
> }
> opal_list_append(&(flist->super), &(item->super));
> ptr += flist->fl_elem_size;
> @@ -119,5 +131,3 @@
> return OPAL_SUCCESS;
> }
>
> -
> -
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel