jorisvandenbossche commented on code in PR #38472: URL: https://github.com/apache/arrow/pull/38472#discussion_r1417052191
########## cpp/src/arrow/c/dlpack.cc: ########## @@ -0,0 +1,131 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/c/dlpack.h" + +#include "arrow/array/array_base.h" +#include "arrow/c/dlpack_abi.h" +#include "arrow/device.h" +#include "arrow/type.h" + +namespace arrow::dlpack { + +Result<DLDataType> GetDLDataType(const DataType& type) { + DLDataType dtype; + dtype.lanes = 1; + dtype.bits = type.bit_width(); + switch (type.id()) { + case Type::INT8: + case Type::INT16: + case Type::INT32: + case Type::INT64: + dtype.code = DLDataTypeCode::kDLInt; + return dtype; + case Type::UINT8: + case Type::UINT16: + case Type::UINT32: + case Type::UINT64: + dtype.code = DLDataTypeCode::kDLUInt; + return dtype; + case Type::HALF_FLOAT: + case Type::FLOAT: + case Type::DOUBLE: + dtype.code = DLDataTypeCode::kDLFloat; + return dtype; + case Type::BOOL: + // DLPack supports byte-packed boolean values + return Status::TypeError("Bit-packed boolean data type not supported by DLPack."); + default: + return Status::TypeError("DataType is not compatible with DLPack spec: ", + type.ToString()); + } +} + +struct DLMTensorCtx { + std::shared_ptr<ArrayData> ref; + std::vector<int64_t> shape; Review Comment: The shape can be removed now from the context? ########## docs/source/python/dlpack.rst: ########## @@ -0,0 +1,93 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _pyarrow-dlpack: + +The DLPack Protocol +=================== + +The DLPack Protocol is a stable in-memory data structure +that allows exchange between major frameworks working +with multidimensional arrays or tensors. It is +designed for cross hardware support meaning it allows exchange +of data on devices other than the CPU (e.g. GPU). + +DLPack protocol had been +`selected as the Python array API standard <https://data-apis.org/array-api/latest/design_topics/data_interchange.html#dlpack-an-in-memory-tensor-structure>`_ +by the +`Consortium for Python Data API Standards <https://data-apis.org/>`_ +in order to enable device aware data interchange between array/tensor +libraries in the Python ecosystem. See more about the standard +in the +`protocol documentation <https://data-apis.org/array-api/latest/index.html>`_ +and more about the DLPack in the Review Comment: ```suggestion and more about DLPack in the ``` ########## docs/source/python/dlpack.rst: ########## @@ -0,0 +1,93 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _pyarrow-dlpack: + +The DLPack Protocol +=================== + +The DLPack Protocol is a stable in-memory data structure +that allows exchange between major frameworks working +with multidimensional arrays or tensors. It is +designed for cross hardware support meaning it allows exchange +of data on devices other than the CPU (e.g. GPU). + +DLPack protocol had been +`selected as the Python array API standard <https://data-apis.org/array-api/latest/design_topics/data_interchange.html#dlpack-an-in-memory-tensor-structure>`_ +by the +`Consortium for Python Data API Standards <https://data-apis.org/>`_ +in order to enable device aware data interchange between array/tensor +libraries in the Python ecosystem. See more about the standard +in the +`protocol documentation <https://data-apis.org/array-api/latest/index.html>`_ +and more about the DLPack in the +`Python Specification for DLPack <https://dmlc.github.io/dlpack/latest/python_spec.html#python-spec>`_. + +Implementation of DLPack in PyArrow +----------------------------------- + +Producing side of the DLPack Protocol is implemented for ``pa.Array`` Review Comment: ```suggestion The producing side of the DLPack Protocol is implemented for ``pa.Array`` ``` ########## docs/source/python/dlpack.rst: ########## @@ -0,0 +1,93 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _pyarrow-dlpack: + +The DLPack Protocol +=================== + +The DLPack Protocol is a stable in-memory data structure +that allows exchange between major frameworks working +with multidimensional arrays or tensors. It is +designed for cross hardware support meaning it allows exchange +of data on devices other than the CPU (e.g. GPU). + +DLPack protocol had been +`selected as the Python array API standard <https://data-apis.org/array-api/latest/design_topics/data_interchange.html#dlpack-an-in-memory-tensor-structure>`_ +by the +`Consortium for Python Data API Standards <https://data-apis.org/>`_ +in order to enable device aware data interchange between array/tensor +libraries in the Python ecosystem. See more about the standard +in the +`protocol documentation <https://data-apis.org/array-api/latest/index.html>`_ +and more about the DLPack in the +`Python Specification for DLPack <https://dmlc.github.io/dlpack/latest/python_spec.html#python-spec>`_. + +Implementation of DLPack in PyArrow +----------------------------------- + +Producing side of the DLPack Protocol is implemented for ``pa.Array`` +and can be used to interchange data between PyArrow and other tensor +libraries. The data structures that are supported in the implementation +of the protocol are integer, unsigned integer and float arrays. The +protocol has no missing data support meaning PyArrow arrays with +missing values cannot be used to transfer data through the DLPack +protocol. Currently Arrow implementation of the protocol only supports Review Comment: ```suggestion protocol. Currently, the Arrow implementation of the protocol only supports ``` ########## docs/source/python/dlpack.rst: ########## @@ -0,0 +1,93 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _pyarrow-dlpack: + +The DLPack Protocol +=================== + +The DLPack Protocol is a stable in-memory data structure +that allows exchange between major frameworks working +with multidimensional arrays or tensors. It is +designed for cross hardware support meaning it allows exchange +of data on devices other than the CPU (e.g. GPU). + +DLPack protocol had been +`selected as the Python array API standard <https://data-apis.org/array-api/latest/design_topics/data_interchange.html#dlpack-an-in-memory-tensor-structure>`_ +by the +`Consortium for Python Data API Standards <https://data-apis.org/>`_ +in order to enable device aware data interchange between array/tensor +libraries in the Python ecosystem. See more about the standard +in the +`protocol documentation <https://data-apis.org/array-api/latest/index.html>`_ +and more about the DLPack in the +`Python Specification for DLPack <https://dmlc.github.io/dlpack/latest/python_spec.html#python-spec>`_. + +Implementation of DLPack in PyArrow +----------------------------------- + +Producing side of the DLPack Protocol is implemented for ``pa.Array`` +and can be used to interchange data between PyArrow and other tensor +libraries. The data structures that are supported in the implementation +of the protocol are integer, unsigned integer and float arrays. The Review Comment: ```suggestion libraries. Supported data types are integer, unsigned integer and float. The ``` ########## docs/source/python/dlpack.rst: ########## @@ -0,0 +1,93 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _pyarrow-dlpack: + +The DLPack Protocol +=================== + +The DLPack Protocol is a stable in-memory data structure Review Comment: maybe link "DLPack Protocol" to their main github https://github.com/dmlc/dlpack ? ########## python/pyarrow/_dlpack.pxi: ########## @@ -0,0 +1,47 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from libc.stdlib cimport malloc, free + +cimport cpython +from cpython.pycapsule cimport PyCapsule_New +from cython import sizeof + + +cdef void pycapsule_deleter(object dltensor) noexcept: Review Comment: Since we also have other capsule deleters for the C Data Interface, let's name this more explicit ```suggestion cdef void dlpack_pycapsule_deleter(object dltensor) noexcept: ``` ########## docs/source/python/dlpack.rst: ########## @@ -0,0 +1,93 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _pyarrow-dlpack: + +The DLPack Protocol +=================== + +The DLPack Protocol is a stable in-memory data structure +that allows exchange between major frameworks working +with multidimensional arrays or tensors. It is +designed for cross hardware support meaning it allows exchange +of data on devices other than the CPU (e.g. GPU). + +DLPack protocol had been +`selected as the Python array API standard <https://data-apis.org/array-api/latest/design_topics/data_interchange.html#dlpack-an-in-memory-tensor-structure>`_ +by the +`Consortium for Python Data API Standards <https://data-apis.org/>`_ +in order to enable device aware data interchange between array/tensor +libraries in the Python ecosystem. See more about the standard +in the +`protocol documentation <https://data-apis.org/array-api/latest/index.html>`_ +and more about the DLPack in the +`Python Specification for DLPack <https://dmlc.github.io/dlpack/latest/python_spec.html#python-spec>`_. + +Implementation of DLPack in PyArrow +----------------------------------- + +Producing side of the DLPack Protocol is implemented for ``pa.Array`` +and can be used to interchange data between PyArrow and other tensor +libraries. The data structures that are supported in the implementation +of the protocol are integer, unsigned integer and float arrays. The +protocol has no missing data support meaning PyArrow arrays with +missing values cannot be used to transfer data through the DLPack Review Comment: ```suggestion missing values cannot be transferred through the DLPack ``` ########## python/pyarrow/array.pxi: ########## @@ -1778,6 +1778,40 @@ cdef class Array(_PandasConvertible): return pyarrow_wrap_array(array) + def __dlpack__(self, stream=None): + """Export a primitive array as a DLPack capsule. + + Parameters + ---------- + stream : int, optional + A Python integer representing a pointer to a stream. Currently not supported. + Stream is provided by the consumer to the producer to instruct the producer + to ensure that operations can safely be performed on the array. + + Returns + ------- + capsule : PyCapsule + A DLPack capsule for the array, containing a DLPackManagedTensor. + """ + if stream is None: + return to_dlpack(self) + else: + raise NotImplementedError( + "Only stream=None is supported." + ) + + def __dlpack_device__(self): + """ + Performs the operation __dlpack_device__. Review Comment: @AlenkaF have you checked it locally that this works with pytorch (or another tensor library that might check `__dlpack_device__` (ideally we would have integration tests for this part as well instead of only with numpy, but this alone is probably not worth adding a heavy test dependency on pytorch) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
