Thanks for looking into this, Micah. One convention I'd like to append: we mostly avoid convenience typedefs, but an exception is the common case of `vector<shared_ptr<{class name}>>`, for which we allow and encourage the use of `{class name}Vector` typedefs. (Conversely, nothing should be named /^\w+Vector$/ which isn't a vector-of-shared_ptr typedef.)
1. Agreed on Array::data(), in fact I made this change in #9490 (ARROW-9196) 2. Is it worthwhile to keep RecordBatch abstract? I am only aware of a single concrete subclass (SimpleRecordBatch). I agree that RecordBatch's assessors should avoid unnecessary construction of shared_ptrs, and demoting it to a concrete class would make it clear that this is safe. Ben On Mon, Feb 15, 2021 at 11:55 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > (Apologies if this is a double send) > > I'll open a PR on this soon. To update the dev guide. > > Given this standard there are few accessor methods that I think we should > either convert or create a new accessor that does the correct thing with > respect to return type. Given how core these methods are I think the > latter might be a better approach (but I don't feel too strongly if others > have a good rationale one way or another): > 1. Array::Data() [1] - In looking at some CPU profiles it seems like most > of the time spent in Validate is due to shared_ptr > construction/destruction. In auditing the code this method appears to be > the only one returning copies. > > 2. RecordBatch::Column* [2] - These are more questionable since they are > virtual methods, it is not clear if dynamic Record batches where the > intention behind this design. So it might not be worth it. Anecdotally, > I've known people who have written some naive iteration code use these > methods where shared_ptr construction/destruction contributed 10% overhead. > > Thanks, > Micah > > > > [1] > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163 > [2] > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98 > > On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > Agreed. We should probably document this in the C++ developer docs. > > > > On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <anto...@python.org> > wrote: > > > > > > > > > Hi Micah, > > > > > > That's roughly my mental model as well. > > > > > > However, for 4) I would say that return a const ref to shared_ptr if > > > preferable because the caller will often need the ownership (especially > > > with Array, ArrayData, DataType, etc.). > > > > > > Regards > > > > > > Antoine. > > > > > > > > > Le 08/02/2021 à 18:02, Micah Kornfield a écrit : > > > > I'm not sure how consistent we are with how shared_ptr is used as a > > > > parameter to methods and as a return type. In reviewing and writing > > code > > > > I've been using these guidelines for myself and I was wondering if > they > > > > align with others: > > > > > > > > 1. If a copy of a shared_ptr is not intended to be made by the > method > > then > > > > use a const ref to underlying type. i.e. void Foo(const Array& > array) > > is > > > > preferable to void Foo(const shared_ptr<Array>& array) [1]. > > > > > > > > 2. If a copy is always going to be made pass by value. i.e. void > > > > Foo(std::shared_ptr<Array>) and to std::move within the method. The > > last > > > > time I did research on this allowed for eliminating shared_ptr > > overhead if > > > > the caller also can std::move() the parameter. > > > > > > > > 3. If a copy might be made pass the shared_ptr by const reference. > > i.e. void > > > > Foo(const shared_ptr<T>& array) The exception to this if the contents > > of > > > > the shared_ptr a reference can effectively be copied cheaply without > > as is > > > > the case with Array via ArrayData in which case #1 applies. > > > > > > > > 4. For accessor methods prefer returning by const ref or underlying > > ref to > > > > underlying when appropriate. i.e. const std::shared_ptr<Array>& > > foo() or > > > > const Array& Foo(). > > > > > > > > 5. For factory like methods return a copy i.e. std::shared_ptr<Array> > > > > MakeFoo(); > > > > > > > > Is this other people's mental model? I'd like to update our style > > guide so > > > > we can hopefully drive consistency over time. > > > > > > > > Thanks, > > > > Micah > > > > > > > > [1] Array is somewhat of a special case because one can have > > essentially > > > > the same shared_ptr copy semantics by copying the underlying > ArrayData > > > > object. > > > > > > >