Thanks for looking into this, Micah.

One convention I'd like to append: we mostly avoid convenience typedefs,
but an
exception is the common case of `vector<shared_ptr<{class name}>>`, for
which we
allow and encourage the use of `{class name}Vector` typedefs. (Conversely,
nothing
should be named /^\w+Vector$/ which isn't a vector-of-shared_ptr typedef.)

1. Agreed on Array::data(), in fact I made this change in #9490 (ARROW-9196)

2. Is it worthwhile to keep RecordBatch abstract? I am only aware of a
single concrete
   subclass (SimpleRecordBatch). I agree that RecordBatch's assessors
should avoid
   unnecessary construction of shared_ptrs, and demoting it to a concrete
class would
   make it clear that this is safe.

Ben

On Mon, Feb 15, 2021 at 11:55 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> (Apologies if this is a double send)
>
> I'll open a PR on this soon. To update the dev guide.
>
> Given this standard there are few accessor methods that I think we should
> either convert or create a new accessor that does the correct thing with
> respect to return type.  Given how core these methods are I think the
> latter might be a better approach (but I don't feel too strongly if others
> have a good rationale one way or another):
> 1. Array::Data() [1] - In looking at some CPU profiles it seems like most
> of the time spent in Validate is due to shared_ptr
> construction/destruction.  In auditing the code this method appears to be
> the only one returning copies.
>
> 2. RecordBatch::Column* [2] - These are more questionable since they are
> virtual methods, it is not clear if dynamic Record batches where the
> intention behind this design. So it might not be worth it.  Anecdotally,
> I've known people who have written some naive iteration code use these
> methods where shared_ptr construction/destruction contributed 10% overhead.
>
> Thanks,
> Micah
>
>
>
> [1]
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163
> [2]
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98
>
> On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > Agreed. We should probably document this in the C++ developer docs.
> >
> > On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <anto...@python.org>
> wrote:
> > >
> > >
> > > Hi Micah,
> > >
> > > That's roughly my mental model as well.
> > >
> > > However, for 4) I would say that return a const ref to shared_ptr if
> > > preferable because the caller will often need the ownership (especially
> > > with Array, ArrayData, DataType, etc.).
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> > > > I'm not sure how consistent we are with how shared_ptr is used as a
> > > > parameter to methods and as a return type.  In reviewing and writing
> > code
> > > > I've been using these guidelines for myself and I was wondering if
> they
> > > > align with others:
> > > >
> > > > 1.  If a copy of a shared_ptr is not intended to be made by the
> method
> > then
> > > > use a const ref to underlying type.  i.e. void Foo(const Array&
> array)
> > is
> > > > preferable to void Foo(const shared_ptr<Array>& array) [1].
> > > >
> > > > 2.  If a copy is always going to be made pass by value.  i.e. void
> > > > Foo(std::shared_ptr<Array>) and to std::move within the method.  The
> > last
> > > > time I did research on this allowed for eliminating shared_ptr
> > overhead if
> > > > the caller also can std::move() the parameter.
> > > >
> > > > 3.  If a copy might be made pass the shared_ptr by const reference.
> > i.e. void
> > > > Foo(const shared_ptr<T>& array) The exception to this if the contents
> > of
> > > > the shared_ptr a reference can effectively be copied cheaply without
> > as is
> > > > the case with Array via ArrayData in which case #1 applies.
> > > >
> > > > 4. For accessor methods prefer returning by const ref or underlying
> > ref to
> > > > underlying when appropriate.     i.e. const std::shared_ptr<Array>&
> > foo()  or
> > > > const Array& Foo().
> > > >
> > > > 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
> > > > MakeFoo();
> > > >
> > > > Is this other people's mental model?  I'd like to update our style
> > guide so
> > > > we can hopefully drive consistency over time.
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > > [1] Array is somewhat of a special case because one can have
> > essentially
> > > > the same shared_ptr copy semantics by copying the underlying
> ArrayData
> > > > object.
> > > >
> >
>

Reply via email to