Yes, I think the former would be acceptable. I don't think that anyone should be putting Buffer::mutable_data() on the inner loop of a hot path.
On Sat, Apr 24, 2021 at 8:09 AM Antoine Pitrou <[email protected]> wrote: > > > It depends what that entails exactly? Would we introduce a conditional > in release mode: > > uint8_t* mutable_data() { > return is_mutable() ? const_cast<uint8_t*>(data()) : nullptr; > } > > or would be always blindly return the data pointer, which seems > extremely dangerous to me: > > uint8_t* mutable_data() { > return const_cast<uint8_t*>(data()); > } > > I would be ok with the former. > > Regards > > Antoine. > > > Le 24/04/2021 à 15:05, Wes McKinney a écrit : > > hi folks, > > > > Thoughts about this? Since we already assert that is_mutable_ is true > > in debug builds when accessing mutable_data_, using a const cast here > > seems relatively benign, and then we can drop 8 bytes from the Buffer > > struct > > > > On Wed, Apr 21, 2021 at 10:10 AM Wes McKinney <[email protected]> wrote: > >> > >> I'd be open to this simplification of the data structure, but I'll see > >> what others think > >> > >> On Wed, Apr 21, 2021 at 9:26 AM Niranda Perera <[email protected]> > >> wrote: > >>> > >>> @Wes McKinney On a separate note, why would there be 2 pointers in the > >>> Buffer class, const uint8_t* data_ & uint8_t* mutable_data_? Can't we > >>> have 1 pointer and then use const_cast<> in mutable_data() method (or > >>> vise versa)? > >>> > >>> On Wed, Apr 21, 2021 at 10:09 AM Niranda Perera > >>> <[email protected]> wrote: > >>>> > >>>> Sorry, that's exactly what you've mentioned in the jira. :-) Please > >>>> ignore! > >>>> > >>>> On Wed, Apr 21, 2021 at 10:07 AM Niranda Perera > >>>> <[email protected]> wrote: > >>>>> > >>>>> @Wes, @Antoine, > >>>>> As @Weston pointed out, it seems like the issue is here. > >>>>> https://github.com/apache/arrow/blob/37c27d1eaf0fa61281ad103c08a0251bb6883ec4/cpp/src/arrow/python/numpy_convert.cc#L51 > >>>>> When the numpy buffer's is_mutable_ marked as true, ideally, > >>>>> *mutable_data_ should have also been set IMO. > >>>>> > >>>>> On Wed, Apr 21, 2021 at 10:00 AM Wes McKinney <[email protected]> > >>>>> wrote: > >>>>>> > >>>>>> Definitely a bug. I just opened > >>>>>> > >>>>>> https://issues.apache.org/jira/browse/ARROW-12495 > >>>>>> > >>>>>> If there's a chance to get this into 4.0.0 this would be a nice one > >>>>>> but I suspect the next RC is already under way (it need not block > >>>>>> since this bug has been present a long time) > >>>>>> > >>>>>> On Wed, Apr 21, 2021 at 3:31 AM Antoine Pitrou <[email protected]> > >>>>>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> It sounds like a bug if is_mutable_ is true but mutable_data_ is > >>>>>>> nullptr. > >>>>>>> > >>>>>>> Regards > >>>>>>> > >>>>>>> Antoine. > >>>>>>> > >>>>>>> > >>>>>>> Le 21/04/2021 à 03:17, Weston Pace a écrit : > >>>>>>>> If it comes from pandas (and is eligible for zero-copy) then the > >>>>>>>> buffer implementation will be `NumPyBuffer`. Printing one in GDB > >>>>>>>> yields... > >>>>>>>> > >>>>>>>> ``` > >>>>>>>> $12 = {_vptr.Buffer = 0x7f0b66e147f8 <vtable for > >>>>>>>> arrow::py::NumPyBuffer+16>, is_mutable_ = true, is_cpu_ = true, data_ > >>>>>>>> = 0x55b71f901a70 "\001", mutable_data_ = 0x0, size_ = 16, capacity_ = > >>>>>>>> 16, > >>>>>>>> parent_ = {<std::__shared_ptr<arrow::Buffer, > >>>>>>>> (__gnu_cxx::_Lock_policy)2>> = > >>>>>>>> {<std::__shared_ptr_access<arrow::Buffer, (__gnu_cxx::_Lock_policy)2, > >>>>>>>> false, false>> = {<No data fields>}, _M_ptr = 0x0, > >>>>>>>> _M_refcount = {_M_pi = 0x0}}, <No data fields>}, > >>>>>>>> memory_manager_ = {<std::__shared_ptr<arrow::MemoryManager, > >>>>>>>> (__gnu_cxx::_Lock_policy)2>> = > >>>>>>>> {<std::__shared_ptr_access<arrow::MemoryManager, > >>>>>>>> (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, > >>>>>>>> _M_ptr = 0x55b71fdca4e0, _M_refcount = {_M_pi = > >>>>>>>> 0x55b71fb90640}}, <No data fields>}} > >>>>>>>> ``` > >>>>>>>> > >>>>>>>> Notice that `is_cpu_` and `is_mutable_` are both `true`. It's maybe > >>>>>>>> a > >>>>>>>> bug that `is_mutable_` is true. Although maybe not as it appears to > >>>>>>>> be telling whether the underlying numpy buffer itself is mutable or > >>>>>>>> not... > >>>>>>>> > >>>>>>>> ``` > >>>>>>>> if (PyArray_FLAGS(ndarray) & NPY_ARRAY_WRITEABLE) { > >>>>>>>> is_mutable_ = true; > >>>>>>>> } > >>>>>>>> ``` > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Apr 20, 2021 at 2:15 PM Niranda Perera > >>>>>>>> <[email protected]> wrote: > >>>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> We have been using Arrow v2.0.0 and we encountered the following > >>>>>>>>> issue. > >>>>>>>>> > >>>>>>>>> I was reading a table with numeric data using pandas.read_csv and > >>>>>>>>> then > >>>>>>>>> converting it into pyarrow table. In our application (Cylon > >>>>>>>>> <https://github.com/cylondata/cylon>), we are accessing this > >>>>>>>>> pyarrow table > >>>>>>>>> from c++. We want to access the mutable data of the arrays in the > >>>>>>>>> pyarrow > >>>>>>>>> table. > >>>>>>>>> > >>>>>>>>> But the following returns a nullptr. > >>>>>>>>> T *mutable_data = array->data()->GetMutableValues<T>(1); // returns > >>>>>>>>> nullptr > >>>>>>>>> > >>>>>>>>> Interestingly, > >>>>>>>>> array->data()->buffers[1]->IsMutable(); // returns true > >>>>>>>>> array->data()->buffers[1]->IsCpu(); // returns true > >>>>>>>>> > >>>>>>>>> This only happens when I use pandas df to create a pyarrow table. It > >>>>>>>>> wouldn't happen when I use pyarrow.read_csv. So, I am guessing > >>>>>>>>> there's some > >>>>>>>>> issue in the buffer creation from pandas df. > >>>>>>>>> > >>>>>>>>> Is this an expected behavior? or has this been resolved in v2.0< > >>>>>>>>> releases? > >>>>>>>>> > >>>>>>>>> Best > >>>>>>>>> -- > >>>>>>>>> Niranda Perera > >>>>>>>>> https://niranda.dev/ > >>>>>>>>> @n1r44 <https://twitter.com/N1R44> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Niranda Perera > >>>>> https://niranda.dev/ > >>>>> @n1r44 > >>>>> > >>>> > >>>> > >>>> -- > >>>> Niranda Perera > >>>> https://niranda.dev/ > >>>> @n1r44 > >>>> > >>> > >>> > >>> -- > >>> Niranda Perera > >>> https://niranda.dev/ > >>> @n1r44 > >>>
