I agree with George that zeroing memory only in the debug builds could hide
bugs, and thus would want to see the debug and non-debug builds have the
same behavior (both malloc or both calloc).  So, I also agree this looks
initially like a hard choice.

What about using malloc() in non-debug builds and having a MCA param
control malloc-vs-calloc in a debug build (with malloc being the default)?
The param name could be something with "valgrind" in it to allow it to
control any other "paranoid code" that may be introduced just to silence
valgrind warnings.

-Paul

On Fri, Oct 3, 2014 at 3:02 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> It's a tough call. This proposal will create significant differences
> between the debug and fast builds. As the entire objects will be set to
> zero this might reduce bugs in the debug build, bugs that will be horribly
> difficult to track in any non-debug builds. Moreover, if the structures are
> carefully accessed in our code, adding such a disruptive initialization
> just to prevent valgrind from reporting false-positive about uninitialized
> reads in memcpy is too costly as a solution (I am also conscient that it
> will be almost impossible to write a valgrind suppression rule for the
> specific case you mention).
>
> Some parts of the code have (or at least had) some level of cleanness for
> the gaps in the structures. The solution was to minimally zero-fy the gaps,
> maintaining the same behavior between debug and non-debug builds. However,
> in order to do this one need to know the layout of the structure, so this
> is not a completely generic solution...
>
>   George.
>
>
> On Oct 3, 2014, at 16:54 , Jeff Squyres (jsquyres) <jsquy...@cisco.com>
> wrote:
>
> > WHAT: change the malloc() to calloc() in opal_obj_new() (perhaps only in
> debug builds?)
> >
> > WHY: Drastically reduces valgrind output
> >
> > WHERE: see
> https://github.com/open-mpi/ompi/blob/master/opal/class/opal_object.h#L462-L467
> >
> > TIMEOUT: teleconf, Tue, Oct 14 (there's no rush)
> >
> > MORE DETAIL:
> >
> > I was debugging some code today and came across a bunch of places where
> we write structs down various IPC mechanisms, and the structs contain
> holes.  In most places, the performance doesn't matter / the readability of
> struct members is more important, so we haven't re-ordered the structs to
> remove holes.  But consequently, those holes end up uninitialized, and
> therefore memcpy()ing or write()ing instances of these structs causes
> valgrind to emit warnings.
> >
> > The patch below eliminates most (all?) of these valgrind warnings -- in
> debug builds, it changes the malloc() inside OBJ_NEW to a calloc().
> >
> > Upon a little more thought, however, I wonder if we use OBJ_NEW in any
> fast code paths (other than in bulk, such as when we need to grow a free
> list).  Specifically: would it be terrible to *always* calloc -- not just
> for debug builds?
> >
> > -----
> > diff --git a/opal/class/opal_object.h b/opal/class/opal_object.h
> > index 7012bac..585f13e 100644
> > --- a/opal/class/opal_object.h
> > +++ b/opal/class/opal_object.h
> > @@ -464,7 +464,11 @@ static inline opal_object_t
> *opal_obj_new(opal_class_t * cl
> >     opal_object_t *object;
> >     assert(cls->cls_sizeof >= sizeof(opal_object_t));
> >
> > +#if OPAL_ENABLE_DEBUG
> > +    object = (opal_object_t *) calloc(1, cls->cls_sizeof);
> > +#else
> >     object = (opal_object_t *) malloc(cls->cls_sizeof);
> > +#endif
> >     if (0 == cls->cls_initialized) {
> >         opal_class_initialize(cls);
> >     }
> > -----
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/10/16001.php
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/10/16004.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to