How about a compromise -- how about enabling calloc() when --with-valgrind is 
specified on the command line?

I.e., don't tie it to debug builds, but to valgrind-enabled builds?


On Oct 3, 2014, at 6:11 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> I agree with George that zeroing memory only in the debug builds could hide 
> bugs, and thus would want to see the debug and non-debug builds have the same 
> behavior (both malloc or both calloc).  So, I also agree this looks initially 
> like a hard choice.
> 
> What about using malloc() in non-debug builds and having a MCA param control 
> malloc-vs-calloc in a debug build (with malloc being the default)?  The param 
> name could be something with "valgrind" in it to allow it to control any 
> other "paranoid code" that may be introduced just to silence valgrind 
> warnings.
> 
> -Paul
> 
> On Fri, Oct 3, 2014 at 3:02 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> It’s a tough call. This proposal will create significant differences between 
> the debug and fast builds. As the entire objects will be set to zero this 
> might reduce bugs in the debug build, bugs that will be horribly difficult to 
> track in any non-debug builds. Moreover, if the structures are carefully 
> accessed in our code, adding such a disruptive initialization just to prevent 
> valgrind from reporting false-positive about uninitialized reads in memcpy is 
> too costly as a solution (I am also conscient that it will be almost 
> impossible to write a valgrind suppression rule for the specific case you 
> mention).
> 
> Some parts of the code have (or at least had) some level of cleanness for the 
> gaps in the structures. The solution was to minimally zero-fy the gaps, 
> maintaining the same behavior between debug and non-debug builds. However, in 
> order to do this one need to know the layout of the structure, so this is not 
> a completely generic solution…
> 
>   George.
> 
> 
> On Oct 3, 2014, at 16:54 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> 
> > WHAT: change the malloc() to calloc() in opal_obj_new() (perhaps only in 
> > debug builds?)
> >
> > WHY: Drastically reduces valgrind output
> >
> > WHERE: see 
> > https://github.com/open-mpi/ompi/blob/master/opal/class/opal_object.h#L462-L467
> >
> > TIMEOUT: teleconf, Tue, Oct 14 (there's no rush)
> >
> > MORE DETAIL:
> >
> > I was debugging some code today and came across a bunch of places where we 
> > write structs down various IPC mechanisms, and the structs contain holes.  
> > In most places, the performance doesn't matter / the readability of struct 
> > members is more important, so we haven't re-ordered the structs to remove 
> > holes.  But consequently, those holes end up uninitialized, and therefore 
> > memcpy()ing or write()ing instances of these structs causes valgrind to 
> > emit warnings.
> >
> > The patch below eliminates most (all?) of these valgrind warnings -- in 
> > debug builds, it changes the malloc() inside OBJ_NEW to a calloc().
> >
> > Upon a little more thought, however, I wonder if we use OBJ_NEW in any fast 
> > code paths (other than in bulk, such as when we need to grow a free list).  
> > Specifically: would it be terrible to *always* calloc -- not just for debug 
> > builds?
> >
> > -----
> > diff --git a/opal/class/opal_object.h b/opal/class/opal_object.h
> > index 7012bac..585f13e 100644
> > --- a/opal/class/opal_object.h
> > +++ b/opal/class/opal_object.h
> > @@ -464,7 +464,11 @@ static inline opal_object_t *opal_obj_new(opal_class_t 
> > * cl
> >     opal_object_t *object;
> >     assert(cls->cls_sizeof >= sizeof(opal_object_t));
> >
> > +#if OPAL_ENABLE_DEBUG
> > +    object = (opal_object_t *) calloc(1, cls->cls_sizeof);
> > +#else
> >     object = (opal_object_t *) malloc(cls->cls_sizeof);
> > +#endif
> >     if (0 == cls->cls_initialized) {
> >         opal_class_initialize(cls);
> >     }
> > -----
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to: 
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/10/16001.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16004.php
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16005.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to