(Apologies if this is a double send) I'll open a PR on this soon. To update the dev guide.
Given this standard there are few accessor methods that I think we should either convert or create a new accessor that does the correct thing with respect to return type. Given how core these methods are I think the latter might be a better approach (but I don't feel too strongly if others have a good rationale one way or another): 1. Array::Data() [1] - In looking at some CPU profiles it seems like most of the time spent in Validate is due to shared_ptr construction/destruction. In auditing the code this method appears to be the only one returning copies. 2. RecordBatch::Column* [2] - These are more questionable since they are virtual methods, it is not clear if dynamic Record batches where the intention behind this design. So it might not be worth it. Anecdotally, I've known people who have written some naive iteration code use these methods where shared_ptr construction/destruction contributed 10% overhead. Thanks, Micah [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163 [2] https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98 On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <wesmck...@gmail.com> wrote: > Agreed. We should probably document this in the C++ developer docs. > > On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <anto...@python.org> wrote: > > > > > > Hi Micah, > > > > That's roughly my mental model as well. > > > > However, for 4) I would say that return a const ref to shared_ptr if > > preferable because the caller will often need the ownership (especially > > with Array, ArrayData, DataType, etc.). > > > > Regards > > > > Antoine. > > > > > > Le 08/02/2021 à 18:02, Micah Kornfield a écrit : > > > I'm not sure how consistent we are with how shared_ptr is used as a > > > parameter to methods and as a return type. In reviewing and writing > code > > > I've been using these guidelines for myself and I was wondering if they > > > align with others: > > > > > > 1. If a copy of a shared_ptr is not intended to be made by the method > then > > > use a const ref to underlying type. i.e. void Foo(const Array& array) > is > > > preferable to void Foo(const shared_ptr<Array>& array) [1]. > > > > > > 2. If a copy is always going to be made pass by value. i.e. void > > > Foo(std::shared_ptr<Array>) and to std::move within the method. The > last > > > time I did research on this allowed for eliminating shared_ptr > overhead if > > > the caller also can std::move() the parameter. > > > > > > 3. If a copy might be made pass the shared_ptr by const reference. > i.e. void > > > Foo(const shared_ptr<T>& array) The exception to this if the contents > of > > > the shared_ptr a reference can effectively be copied cheaply without > as is > > > the case with Array via ArrayData in which case #1 applies. > > > > > > 4. For accessor methods prefer returning by const ref or underlying > ref to > > > underlying when appropriate. i.e. const std::shared_ptr<Array>& > foo() or > > > const Array& Foo(). > > > > > > 5. For factory like methods return a copy i.e. std::shared_ptr<Array> > > > MakeFoo(); > > > > > > Is this other people's mental model? I'd like to update our style > guide so > > > we can hopefully drive consistency over time. > > > > > > Thanks, > > > Micah > > > > > > [1] Array is somewhat of a special case because one can have > essentially > > > the same shared_ptr copy semantics by copying the underlying ArrayData > > > object. > > > >