+1 for former.

On Sat, Apr 24, 2021 at 9:30 AM Wes McKinney <[email protected]> wrote:

> Yes, I think the former would be acceptable. I don't think that anyone
> should be putting Buffer::mutable_data() on the inner loop of a hot
> path.
>
> On Sat, Apr 24, 2021 at 8:09 AM Antoine Pitrou <[email protected]> wrote:
> >
> >
> > It depends what that entails exactly? Would we introduce a conditional
> > in release mode:
> >
> >    uint8_t* mutable_data() {
> >      return is_mutable() ? const_cast<uint8_t*>(data()) : nullptr;
> >    }
> >
> > or would be always blindly return the data pointer, which seems
> > extremely dangerous to me:
> >
> >    uint8_t* mutable_data() {
> >      return const_cast<uint8_t*>(data());
> >    }
> >
> > I would be ok with the former.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 24/04/2021 à 15:05, Wes McKinney a écrit :
> > > hi folks,
> > >
> > > Thoughts about this? Since we already assert that is_mutable_ is true
> > > in debug builds when accessing mutable_data_, using a const cast here
> > > seems relatively benign, and then we can drop 8 bytes from the Buffer
> > > struct
> > >
> > > On Wed, Apr 21, 2021 at 10:10 AM Wes McKinney <[email protected]>
> wrote:
> > >>
> > >> I'd be open to this simplification of the data structure, but I'll see
> > >> what others think
> > >>
> > >> On Wed, Apr 21, 2021 at 9:26 AM Niranda Perera <
> [email protected]> wrote:
> > >>>
> > >>> @Wes McKinney On a separate note, why would there be 2 pointers in
> the Buffer class, const uint8_t* data_ &  uint8_t* mutable_data_? Can't we
> have 1 pointer and then use const_cast<> in mutable_data() method (or vise
> versa)?
> > >>>
> > >>> On Wed, Apr 21, 2021 at 10:09 AM Niranda Perera <
> [email protected]> wrote:
> > >>>>
> > >>>> Sorry, that's exactly what you've mentioned in the jira. :-) Please
> ignore!
> > >>>>
> > >>>> On Wed, Apr 21, 2021 at 10:07 AM Niranda Perera <
> [email protected]> wrote:
> > >>>>>
> > >>>>> @Wes, @Antoine,
> > >>>>> As @Weston pointed out, it seems like the issue is here.
> > >>>>>
> https://github.com/apache/arrow/blob/37c27d1eaf0fa61281ad103c08a0251bb6883ec4/cpp/src/arrow/python/numpy_convert.cc#L51
> > >>>>> When the numpy buffer's is_mutable_ marked as true, ideally,
> *mutable_data_ should have also been set IMO.
> > >>>>>
> > >>>>> On Wed, Apr 21, 2021 at 10:00 AM Wes McKinney <[email protected]>
> wrote:
> > >>>>>>
> > >>>>>> Definitely a bug. I just opened
> > >>>>>>
> > >>>>>> https://issues.apache.org/jira/browse/ARROW-12495
> > >>>>>>
> > >>>>>> If there's a chance to get this into 4.0.0 this would be a nice
> one
> > >>>>>> but I suspect the next RC is already under way (it need not block
> > >>>>>> since this bug has been present a long time)
> > >>>>>>
> > >>>>>> On Wed, Apr 21, 2021 at 3:31 AM Antoine Pitrou <
> [email protected]> wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> It sounds like a bug if is_mutable_ is true but mutable_data_ is
> nullptr.
> > >>>>>>>
> > >>>>>>> Regards
> > >>>>>>>
> > >>>>>>> Antoine.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Le 21/04/2021 à 03:17, Weston Pace a écrit :
> > >>>>>>>> If it comes from pandas (and is eligible for zero-copy) then the
> > >>>>>>>> buffer implementation will be `NumPyBuffer`.  Printing one in
> GDB
> > >>>>>>>> yields...
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> $12 = {_vptr.Buffer = 0x7f0b66e147f8 <vtable for
> > >>>>>>>> arrow::py::NumPyBuffer+16>, is_mutable_ = true, is_cpu_ = true,
> data_
> > >>>>>>>> = 0x55b71f901a70 "\001", mutable_data_ = 0x0, size_ = 16,
> capacity_ =
> > >>>>>>>> 16,
> > >>>>>>>>     parent_ = {<std::__shared_ptr<arrow::Buffer,
> > >>>>>>>> (__gnu_cxx::_Lock_policy)2>> =
> > >>>>>>>> {<std::__shared_ptr_access<arrow::Buffer,
> (__gnu_cxx::_Lock_policy)2,
> > >>>>>>>> false, false>> = {<No data fields>}, _M_ptr = 0x0,
> > >>>>>>>>         _M_refcount = {_M_pi = 0x0}}, <No data fields>},
> > >>>>>>>>     memory_manager_ = {<std::__shared_ptr<arrow::MemoryManager,
> > >>>>>>>> (__gnu_cxx::_Lock_policy)2>> =
> > >>>>>>>> {<std::__shared_ptr_access<arrow::MemoryManager,
> > >>>>>>>> (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>},
> > >>>>>>>>         _M_ptr = 0x55b71fdca4e0, _M_refcount = {_M_pi =
> > >>>>>>>> 0x55b71fb90640}}, <No data fields>}}
> > >>>>>>>> ```
> > >>>>>>>>
> > >>>>>>>> Notice that `is_cpu_` and `is_mutable_` are both `true`.  It's
> maybe a
> > >>>>>>>> bug that `is_mutable_` is true.  Although maybe not as it
> appears to
> > >>>>>>>> be telling whether the underlying numpy buffer itself is
> mutable or
> > >>>>>>>> not...
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> if (PyArray_FLAGS(ndarray) & NPY_ARRAY_WRITEABLE) {
> > >>>>>>>>       is_mutable_ = true;
> > >>>>>>>> }
> > >>>>>>>> ```
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Tue, Apr 20, 2021 at 2:15 PM Niranda Perera <
> [email protected]> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi all,
> > >>>>>>>>>
> > >>>>>>>>> We have been using Arrow v2.0.0 and we encountered the
> following issue.
> > >>>>>>>>>
> > >>>>>>>>> I was reading a table with numeric data using pandas.read_csv
> and then
> > >>>>>>>>> converting it into pyarrow table. In our application (Cylon
> > >>>>>>>>> <https://github.com/cylondata/cylon>), we are accessing this
> pyarrow table
> > >>>>>>>>> from c++. We want to access the mutable data of the arrays in
> the pyarrow
> > >>>>>>>>> table.
> > >>>>>>>>>
> > >>>>>>>>> But the following returns a nullptr.
> > >>>>>>>>> T *mutable_data = array->data()->GetMutableValues<T>(1); //
> returns nullptr
> > >>>>>>>>>
> > >>>>>>>>> Interestingly,
> > >>>>>>>>> array->data()->buffers[1]->IsMutable(); // returns true
> > >>>>>>>>> array->data()->buffers[1]->IsCpu(); // returns true
> > >>>>>>>>>
> > >>>>>>>>> This only happens when I use pandas df to create a pyarrow
> table. It
> > >>>>>>>>> wouldn't happen when I use pyarrow.read_csv. So, I am guessing
> there's some
> > >>>>>>>>> issue in the buffer creation from pandas df.
> > >>>>>>>>>
> > >>>>>>>>> Is this an expected behavior? or has this been resolved in
> v2.0< releases?
> > >>>>>>>>>
> > >>>>>>>>> Best
> > >>>>>>>>> --
> > >>>>>>>>> Niranda Perera
> > >>>>>>>>> https://niranda.dev/
> > >>>>>>>>> @n1r44 <https://twitter.com/N1R44>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Niranda Perera
> > >>>>> https://niranda.dev/
> > >>>>> @n1r44
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Niranda Perera
> > >>>> https://niranda.dev/
> > >>>> @n1r44
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Niranda Perera
> > >>> https://niranda.dev/
> > >>> @n1r44
> > >>>
>


-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to