jorgecarleitao opened a new pull request #8401:
URL: https://github.com/apache/arrow/pull/8401


   This PR is a proposal to add support to the [C data 
interface](https://arrow.apache.org/docs/format/CDataInterface.html) by 
implementing the necessary functionality to both consume and produce structs 
with its ABI and lifetime rules.
   
   This is for now limited to primitive types and strings (utf8), but it is 
easily generalized for all types whose data is encapsulated in `ArrayData` 
(things with buffers and child data). For types where this does not happen 
(such as `StructArray`, `DictionaryArray` and `ListArray`, where data also 
resides on their specific struct implementations instead of `ArrayData`), I 
believe that more work is needed. IMO we should strive to have all data in 
`ArrayData`, as it makes it significantly easier to export it via the C Data 
Interface, as well as understanding what they physically contain.
   
   Some design choices:
   
   * import and export does not care about the type of the data that is in 
memory (previously `BufferData`, now `Bytes`) - it only cares about how they 
should be converted from and to `ArrayData` to the C data interface.
   * import wraps incoming pointers on a struct behind an `Arc`, so that we 
thread-safely refcount them and can share them between buffers, arrays, etc.
   * `export` places `Buffer`s in `private_data` for bookkeeping and release 
them when the consumer releases it via `release`.
   
   I do not expect this PR to be easy to review, as it is touching sensitive 
(aka `unsafe`) code. However, based on the tests I did so far, I am 
sufficiently happy to PR it.
   
   This PR has three main parts:
   
   1. Addition of an `ffi` module that contains the import and export 
functionality
   2. Add some helpers to import and export an Array from C Data Interface
   3. A crate to test this against Python/C++'s API
   
   It also does a small refactor of `BufferData`, renaming it to `Bytes` 
(motivated by the popular `bytes` crate), and moving it to a separate file.
   
   What is tested:
   
   * round-trip `Python -> Rust -> Python` (new separate crate, 
`arrow-c-integration`)
   * round-trip `Rust -> Python -> Rust`  (new separate crate, 
`arrow-c-integration`)
   * round-trip `Rust -> Rust -> Rust`
   * memory allocation counts
   
   Finally, this PR has a large contribution of @pitrou , that took _a lot_ of 
his time to explain to me how the C++ was doing it and the main things that I 
had to worry about here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to