+1 for former. On Sat, Apr 24, 2021 at 9:30 AM Wes McKinney <[email protected]> wrote:
> Yes, I think the former would be acceptable. I don't think that anyone > should be putting Buffer::mutable_data() on the inner loop of a hot > path. > > On Sat, Apr 24, 2021 at 8:09 AM Antoine Pitrou <[email protected]> wrote: > > > > > > It depends what that entails exactly? Would we introduce a conditional > > in release mode: > > > > uint8_t* mutable_data() { > > return is_mutable() ? const_cast<uint8_t*>(data()) : nullptr; > > } > > > > or would be always blindly return the data pointer, which seems > > extremely dangerous to me: > > > > uint8_t* mutable_data() { > > return const_cast<uint8_t*>(data()); > > } > > > > I would be ok with the former. > > > > Regards > > > > Antoine. > > > > > > Le 24/04/2021 à 15:05, Wes McKinney a écrit : > > > hi folks, > > > > > > Thoughts about this? Since we already assert that is_mutable_ is true > > > in debug builds when accessing mutable_data_, using a const cast here > > > seems relatively benign, and then we can drop 8 bytes from the Buffer > > > struct > > > > > > On Wed, Apr 21, 2021 at 10:10 AM Wes McKinney <[email protected]> > wrote: > > >> > > >> I'd be open to this simplification of the data structure, but I'll see > > >> what others think > > >> > > >> On Wed, Apr 21, 2021 at 9:26 AM Niranda Perera < > [email protected]> wrote: > > >>> > > >>> @Wes McKinney On a separate note, why would there be 2 pointers in > the Buffer class, const uint8_t* data_ & uint8_t* mutable_data_? Can't we > have 1 pointer and then use const_cast<> in mutable_data() method (or vise > versa)? > > >>> > > >>> On Wed, Apr 21, 2021 at 10:09 AM Niranda Perera < > [email protected]> wrote: > > >>>> > > >>>> Sorry, that's exactly what you've mentioned in the jira. :-) Please > ignore! > > >>>> > > >>>> On Wed, Apr 21, 2021 at 10:07 AM Niranda Perera < > [email protected]> wrote: > > >>>>> > > >>>>> @Wes, @Antoine, > > >>>>> As @Weston pointed out, it seems like the issue is here. > > >>>>> > https://github.com/apache/arrow/blob/37c27d1eaf0fa61281ad103c08a0251bb6883ec4/cpp/src/arrow/python/numpy_convert.cc#L51 > > >>>>> When the numpy buffer's is_mutable_ marked as true, ideally, > *mutable_data_ should have also been set IMO. > > >>>>> > > >>>>> On Wed, Apr 21, 2021 at 10:00 AM Wes McKinney <[email protected]> > wrote: > > >>>>>> > > >>>>>> Definitely a bug. I just opened > > >>>>>> > > >>>>>> https://issues.apache.org/jira/browse/ARROW-12495 > > >>>>>> > > >>>>>> If there's a chance to get this into 4.0.0 this would be a nice > one > > >>>>>> but I suspect the next RC is already under way (it need not block > > >>>>>> since this bug has been present a long time) > > >>>>>> > > >>>>>> On Wed, Apr 21, 2021 at 3:31 AM Antoine Pitrou < > [email protected]> wrote: > > >>>>>>> > > >>>>>>> > > >>>>>>> It sounds like a bug if is_mutable_ is true but mutable_data_ is > nullptr. > > >>>>>>> > > >>>>>>> Regards > > >>>>>>> > > >>>>>>> Antoine. > > >>>>>>> > > >>>>>>> > > >>>>>>> Le 21/04/2021 à 03:17, Weston Pace a écrit : > > >>>>>>>> If it comes from pandas (and is eligible for zero-copy) then the > > >>>>>>>> buffer implementation will be `NumPyBuffer`. Printing one in > GDB > > >>>>>>>> yields... > > >>>>>>>> > > >>>>>>>> ``` > > >>>>>>>> $12 = {_vptr.Buffer = 0x7f0b66e147f8 <vtable for > > >>>>>>>> arrow::py::NumPyBuffer+16>, is_mutable_ = true, is_cpu_ = true, > data_ > > >>>>>>>> = 0x55b71f901a70 "\001", mutable_data_ = 0x0, size_ = 16, > capacity_ = > > >>>>>>>> 16, > > >>>>>>>> parent_ = {<std::__shared_ptr<arrow::Buffer, > > >>>>>>>> (__gnu_cxx::_Lock_policy)2>> = > > >>>>>>>> {<std::__shared_ptr_access<arrow::Buffer, > (__gnu_cxx::_Lock_policy)2, > > >>>>>>>> false, false>> = {<No data fields>}, _M_ptr = 0x0, > > >>>>>>>> _M_refcount = {_M_pi = 0x0}}, <No data fields>}, > > >>>>>>>> memory_manager_ = {<std::__shared_ptr<arrow::MemoryManager, > > >>>>>>>> (__gnu_cxx::_Lock_policy)2>> = > > >>>>>>>> {<std::__shared_ptr_access<arrow::MemoryManager, > > >>>>>>>> (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, > > >>>>>>>> _M_ptr = 0x55b71fdca4e0, _M_refcount = {_M_pi = > > >>>>>>>> 0x55b71fb90640}}, <No data fields>}} > > >>>>>>>> ``` > > >>>>>>>> > > >>>>>>>> Notice that `is_cpu_` and `is_mutable_` are both `true`. It's > maybe a > > >>>>>>>> bug that `is_mutable_` is true. Although maybe not as it > appears to > > >>>>>>>> be telling whether the underlying numpy buffer itself is > mutable or > > >>>>>>>> not... > > >>>>>>>> > > >>>>>>>> ``` > > >>>>>>>> if (PyArray_FLAGS(ndarray) & NPY_ARRAY_WRITEABLE) { > > >>>>>>>> is_mutable_ = true; > > >>>>>>>> } > > >>>>>>>> ``` > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Tue, Apr 20, 2021 at 2:15 PM Niranda Perera < > [email protected]> wrote: > > >>>>>>>>> > > >>>>>>>>> Hi all, > > >>>>>>>>> > > >>>>>>>>> We have been using Arrow v2.0.0 and we encountered the > following issue. > > >>>>>>>>> > > >>>>>>>>> I was reading a table with numeric data using pandas.read_csv > and then > > >>>>>>>>> converting it into pyarrow table. In our application (Cylon > > >>>>>>>>> <https://github.com/cylondata/cylon>), we are accessing this > pyarrow table > > >>>>>>>>> from c++. We want to access the mutable data of the arrays in > the pyarrow > > >>>>>>>>> table. > > >>>>>>>>> > > >>>>>>>>> But the following returns a nullptr. > > >>>>>>>>> T *mutable_data = array->data()->GetMutableValues<T>(1); // > returns nullptr > > >>>>>>>>> > > >>>>>>>>> Interestingly, > > >>>>>>>>> array->data()->buffers[1]->IsMutable(); // returns true > > >>>>>>>>> array->data()->buffers[1]->IsCpu(); // returns true > > >>>>>>>>> > > >>>>>>>>> This only happens when I use pandas df to create a pyarrow > table. It > > >>>>>>>>> wouldn't happen when I use pyarrow.read_csv. So, I am guessing > there's some > > >>>>>>>>> issue in the buffer creation from pandas df. > > >>>>>>>>> > > >>>>>>>>> Is this an expected behavior? or has this been resolved in > v2.0< releases? > > >>>>>>>>> > > >>>>>>>>> Best > > >>>>>>>>> -- > > >>>>>>>>> Niranda Perera > > >>>>>>>>> https://niranda.dev/ > > >>>>>>>>> @n1r44 <https://twitter.com/N1R44> > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> Niranda Perera > > >>>>> https://niranda.dev/ > > >>>>> @n1r44 > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Niranda Perera > > >>>> https://niranda.dev/ > > >>>> @n1r44 > > >>>> > > >>> > > >>> > > >>> -- > > >>> Niranda Perera > > >>> https://niranda.dev/ > > >>> @n1r44 > > >>> > -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>
