Jeff, Using calloc() only subject to --with-valgrind sounds good to me. If I'd known such a option exists, I'd not have suggested the MCA param idea.
-Paul On Fri, Oct 3, 2014 at 3:33 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > How about a compromise -- how about enabling calloc() when --with-valgrind > is specified on the command line? > > I.e., don't tie it to debug builds, but to valgrind-enabled builds? > > > On Oct 3, 2014, at 6:11 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > > I agree with George that zeroing memory only in the debug builds could > hide bugs, and thus would want to see the debug and non-debug builds have > the same behavior (both malloc or both calloc). So, I also agree this > looks initially like a hard choice. > > > > What about using malloc() in non-debug builds and having a MCA param > control malloc-vs-calloc in a debug build (with malloc being the default)? > The param name could be something with "valgrind" in it to allow it to > control any other "paranoid code" that may be introduced just to silence > valgrind warnings. > > > > -Paul > > > > On Fri, Oct 3, 2014 at 3:02 PM, George Bosilca <bosi...@icl.utk.edu> > wrote: > > It's a tough call. This proposal will create significant differences > between the debug and fast builds. As the entire objects will be set to > zero this might reduce bugs in the debug build, bugs that will be horribly > difficult to track in any non-debug builds. Moreover, if the structures are > carefully accessed in our code, adding such a disruptive initialization > just to prevent valgrind from reporting false-positive about uninitialized > reads in memcpy is too costly as a solution (I am also conscient that it > will be almost impossible to write a valgrind suppression rule for the > specific case you mention). > > > > Some parts of the code have (or at least had) some level of cleanness > for the gaps in the structures. The solution was to minimally zero-fy the > gaps, maintaining the same behavior between debug and non-debug builds. > However, in order to do this one need to know the layout of the structure, > so this is not a completely generic solution... > > > > George. > > > > > > On Oct 3, 2014, at 16:54 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > > > > WHAT: change the malloc() to calloc() in opal_obj_new() (perhaps only > in debug builds?) > > > > > > WHY: Drastically reduces valgrind output > > > > > > WHERE: see > https://github.com/open-mpi/ompi/blob/master/opal/class/opal_object.h#L462-L467 > > > > > > TIMEOUT: teleconf, Tue, Oct 14 (there's no rush) > > > > > > MORE DETAIL: > > > > > > I was debugging some code today and came across a bunch of places > where we write structs down various IPC mechanisms, and the structs contain > holes. In most places, the performance doesn't matter / the readability of > struct members is more important, so we haven't re-ordered the structs to > remove holes. But consequently, those holes end up uninitialized, and > therefore memcpy()ing or write()ing instances of these structs causes > valgrind to emit warnings. > > > > > > The patch below eliminates most (all?) of these valgrind warnings -- > in debug builds, it changes the malloc() inside OBJ_NEW to a calloc(). > > > > > > Upon a little more thought, however, I wonder if we use OBJ_NEW in any > fast code paths (other than in bulk, such as when we need to grow a free > list). Specifically: would it be terrible to *always* calloc -- not just > for debug builds? > > > > > > ----- > > > diff --git a/opal/class/opal_object.h b/opal/class/opal_object.h > > > index 7012bac..585f13e 100644 > > > --- a/opal/class/opal_object.h > > > +++ b/opal/class/opal_object.h > > > @@ -464,7 +464,11 @@ static inline opal_object_t > *opal_obj_new(opal_class_t * cl > > > opal_object_t *object; > > > assert(cls->cls_sizeof >= sizeof(opal_object_t)); > > > > > > +#if OPAL_ENABLE_DEBUG > > > + object = (opal_object_t *) calloc(1, cls->cls_sizeof); > > > +#else > > > object = (opal_object_t *) malloc(cls->cls_sizeof); > > > +#endif > > > if (0 == cls->cls_initialized) { > > > opal_class_initialize(cls); > > > } > > > ----- > > > > > > -- > > > Jeff Squyres > > > jsquy...@cisco.com > > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > _______________________________________________ > > > devel mailing list > > > de...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16001.php > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16004.php > > > > > > > > -- > > Paul H. Hargrove phhargr...@lbl.gov > > Future Technologies Group > > Computer and Data Sciences Department Tel: +1-510-495-2352 > > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16005.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16006.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900