I agree with George that zeroing memory only in the debug builds could hide bugs, and thus would want to see the debug and non-debug builds have the same behavior (both malloc or both calloc). So, I also agree this looks initially like a hard choice.
What about using malloc() in non-debug builds and having a MCA param control malloc-vs-calloc in a debug build (with malloc being the default)? The param name could be something with "valgrind" in it to allow it to control any other "paranoid code" that may be introduced just to silence valgrind warnings. -Paul On Fri, Oct 3, 2014 at 3:02 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > It's a tough call. This proposal will create significant differences > between the debug and fast builds. As the entire objects will be set to > zero this might reduce bugs in the debug build, bugs that will be horribly > difficult to track in any non-debug builds. Moreover, if the structures are > carefully accessed in our code, adding such a disruptive initialization > just to prevent valgrind from reporting false-positive about uninitialized > reads in memcpy is too costly as a solution (I am also conscient that it > will be almost impossible to write a valgrind suppression rule for the > specific case you mention). > > Some parts of the code have (or at least had) some level of cleanness for > the gaps in the structures. The solution was to minimally zero-fy the gaps, > maintaining the same behavior between debug and non-debug builds. However, > in order to do this one need to know the layout of the structure, so this > is not a completely generic solution... > > George. > > > On Oct 3, 2014, at 16:54 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > > WHAT: change the malloc() to calloc() in opal_obj_new() (perhaps only in > debug builds?) > > > > WHY: Drastically reduces valgrind output > > > > WHERE: see > https://github.com/open-mpi/ompi/blob/master/opal/class/opal_object.h#L462-L467 > > > > TIMEOUT: teleconf, Tue, Oct 14 (there's no rush) > > > > MORE DETAIL: > > > > I was debugging some code today and came across a bunch of places where > we write structs down various IPC mechanisms, and the structs contain > holes. In most places, the performance doesn't matter / the readability of > struct members is more important, so we haven't re-ordered the structs to > remove holes. But consequently, those holes end up uninitialized, and > therefore memcpy()ing or write()ing instances of these structs causes > valgrind to emit warnings. > > > > The patch below eliminates most (all?) of these valgrind warnings -- in > debug builds, it changes the malloc() inside OBJ_NEW to a calloc(). > > > > Upon a little more thought, however, I wonder if we use OBJ_NEW in any > fast code paths (other than in bulk, such as when we need to grow a free > list). Specifically: would it be terrible to *always* calloc -- not just > for debug builds? > > > > ----- > > diff --git a/opal/class/opal_object.h b/opal/class/opal_object.h > > index 7012bac..585f13e 100644 > > --- a/opal/class/opal_object.h > > +++ b/opal/class/opal_object.h > > @@ -464,7 +464,11 @@ static inline opal_object_t > *opal_obj_new(opal_class_t * cl > > opal_object_t *object; > > assert(cls->cls_sizeof >= sizeof(opal_object_t)); > > > > +#if OPAL_ENABLE_DEBUG > > + object = (opal_object_t *) calloc(1, cls->cls_sizeof); > > +#else > > object = (opal_object_t *) malloc(cls->cls_sizeof); > > +#endif > > if (0 == cls->cls_initialized) { > > opal_class_initialize(cls); > > } > > ----- > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16001.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16004.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900