(Apologies if this is a double send)

I'll open a PR on this soon. To update the dev guide.

Given this standard there are few accessor methods that I think we should
either convert or create a new accessor that does the correct thing with
respect to return type.  Given how core these methods are I think the
latter might be a better approach (but I don't feel too strongly if others
have a good rationale one way or another):
1. Array::Data() [1] - In looking at some CPU profiles it seems like most
of the time spent in Validate is due to shared_ptr
construction/destruction.  In auditing the code this method appears to be
the only one returning copies.

2. RecordBatch::Column* [2] - These are more questionable since they are
virtual methods, it is not clear if dynamic Record batches where the
intention behind this design. So it might not be worth it.  Anecdotally,
I've known people who have written some naive iteration code use these
methods where shared_ptr construction/destruction contributed 10% overhead.

Thanks,
Micah



[1]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163
[2]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98

On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <wesmck...@gmail.com> wrote:

> Agreed. We should probably document this in the C++ developer docs.
>
> On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <anto...@python.org> wrote:
> >
> >
> > Hi Micah,
> >
> > That's roughly my mental model as well.
> >
> > However, for 4) I would say that return a const ref to shared_ptr if
> > preferable because the caller will often need the ownership (especially
> > with Array, ArrayData, DataType, etc.).
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> > > I'm not sure how consistent we are with how shared_ptr is used as a
> > > parameter to methods and as a return type.  In reviewing and writing
> code
> > > I've been using these guidelines for myself and I was wondering if they
> > > align with others:
> > >
> > > 1.  If a copy of a shared_ptr is not intended to be made by the method
> then
> > > use a const ref to underlying type.  i.e. void Foo(const Array& array)
> is
> > > preferable to void Foo(const shared_ptr<Array>& array) [1].
> > >
> > > 2.  If a copy is always going to be made pass by value.  i.e. void
> > > Foo(std::shared_ptr<Array>) and to std::move within the method.  The
> last
> > > time I did research on this allowed for eliminating shared_ptr
> overhead if
> > > the caller also can std::move() the parameter.
> > >
> > > 3.  If a copy might be made pass the shared_ptr by const reference.
> i.e. void
> > > Foo(const shared_ptr<T>& array) The exception to this if the contents
> of
> > > the shared_ptr a reference can effectively be copied cheaply without
> as is
> > > the case with Array via ArrayData in which case #1 applies.
> > >
> > > 4. For accessor methods prefer returning by const ref or underlying
> ref to
> > > underlying when appropriate.     i.e. const std::shared_ptr<Array>&
> foo()  or
> > > const Array& Foo().
> > >
> > > 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
> > > MakeFoo();
> > >
> > > Is this other people's mental model?  I'd like to update our style
> guide so
> > > we can hopefully drive consistency over time.
> > >
> > > Thanks,
> > > Micah
> > >
> > > [1] Array is somewhat of a special case because one can have
> essentially
> > > the same shared_ptr copy semantics by copying the underlying ArrayData
> > > object.
> > >
>

Reply via email to