alamb commented on a change in pull request #8401:
URL: https://github.com/apache/arrow/pull/8401#discussion_r522108476



##########
File path: rust/arrow-c-integration/README.md
##########
@@ -0,0 +1,57 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Arrow c integration
+
+This is a Rust crate that tests compatibility between Rust's Arrow 
implementation and PyArrow.

Review comment:
       After reading code for a while it seems like the name for this crate, 
`arrow-c-integration`, is slightly misleading as it is a python binding (via C) 
rather than the C integration. In other words, it is a *user* of the C 
integration bindings, rather than the integration itself 
   
   Maybe a name more like `arrow-rust-python-bindings` would be more indicative 
of what it is doing

##########
File path: rust/arrow/src/array/ffi.rs
##########
@@ -0,0 +1,121 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Contains functionality to load an ArrayData from the C Data Interface

Review comment:
       ```suggestion
   //! Contains functionality to load data to/from `ArrayData` from the C Data 
Interface
   ```

##########
File path: rust/arrow/src/array/ffi.rs
##########
@@ -0,0 +1,121 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Contains functionality to load an ArrayData from the C Data Interface
+
+use std::convert::TryFrom;
+
+use crate::{
+    error::{ArrowError, Result},
+    ffi,
+};
+
+use super::ArrayData;
+
+impl TryFrom<ffi::ArrowArray> for ArrayData {
+    type Error = ArrowError;
+
+    fn try_from(value: ffi::ArrowArray) -> Result<Self> {
+        let data_type = value.data_type()?;
+        let len = value.len();
+        let offset = value.offset();
+        let null_count = value.null_count();
+        let buffers = value.buffers()?;
+        let null_bit_buffer = value.null_bit_buffer();
+
+        // todo: no child data yet...
+        Ok(ArrayData::new(
+            data_type,
+            len,
+            Some(null_count),
+            null_bit_buffer,
+            offset,
+            buffers,
+            vec![],
+        ))
+    }
+}
+
+impl TryFrom<ArrayData> for ffi::ArrowArray {
+    type Error = ArrowError;
+
+    fn try_from(value: ArrayData) -> Result<Self> {
+        let len = value.len();
+        let offset = value.offset() as usize;
+        let null_count = value.null_count();
+        let buffers = value.buffers().to_vec();
+        let null_buffer = value.null_buffer().cloned();
+
+        // todo: no child data yet...

Review comment:
       I think it might be worth `assert!` ing that there are no child data 
arrays so we don't have a silent (and hard to debug failure)

##########
File path: rust/arrow-c-integration/src/lib.rs
##########
@@ -0,0 +1,162 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This library demonstrates a minimal usage of Rust's C data interface to 
pass
+//! arrays from and to Python.
+
+use std::error;
+use std::fmt;
+use std::sync::Arc;
+
+use pyo3::exceptions::PyOSError;
+use pyo3::wrap_pyfunction;
+use pyo3::{libc::uintptr_t, prelude::*};
+
+use arrow::array::{make_array_from_raw, ArrayRef, Int64Array};
+use arrow::compute::kernels;
+use arrow::error::ArrowError;
+use arrow::ffi;
+
+/// an error that bridges ArrowError with a Python error
+#[derive(Debug)]
+enum PyO3ArrowError {
+    ArrowError(ArrowError),
+}
+
+impl fmt::Display for PyO3ArrowError {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+        match *self {
+            PyO3ArrowError::ArrowError(ref e) => e.fmt(f),
+        }
+    }
+}
+
+impl error::Error for PyO3ArrowError {
+    fn source(&self) -> Option<&(dyn error::Error + 'static)> {
+        match *self {
+            // The cause is the underlying implementation error type. Is 
implicitly
+            // cast to the trait object `&error::Error`. This works because the
+            // underlying type already implements the `Error` trait.
+            PyO3ArrowError::ArrowError(ref e) => Some(e),
+        }
+    }
+}
+
+impl From<ArrowError> for PyO3ArrowError {
+    fn from(err: ArrowError) -> PyO3ArrowError {
+        PyO3ArrowError::ArrowError(err)
+    }
+}
+
+impl From<PyO3ArrowError> for PyErr {
+    fn from(err: PyO3ArrowError) -> PyErr {
+        PyOSError::new_err(err.to_string())
+    }
+}
+
+fn to_rust(ob: PyObject, py: Python) -> PyResult<ArrayRef> {
+    // prepare a pointer to receive the Array struct
+    let (array_pointer, schema_pointer) =
+        ffi::ArrowArray::into_raw(unsafe { ffi::ArrowArray::empty() });
+
+    // make the conversion through PyArrow's private API

Review comment:
       I think ensuring this crate's tests are run regularly / via CI would be 
a good way to guard against breaking changes here

##########
File path: rust/arrow/src/array/ffi.rs
##########
@@ -0,0 +1,121 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Contains functionality to load an ArrayData from the C Data Interface
+
+use std::convert::TryFrom;
+
+use crate::{
+    error::{ArrowError, Result},
+    ffi,
+};
+
+use super::ArrayData;
+
+impl TryFrom<ffi::ArrowArray> for ArrayData {
+    type Error = ArrowError;
+
+    fn try_from(value: ffi::ArrowArray) -> Result<Self> {
+        let data_type = value.data_type()?;
+        let len = value.len();
+        let offset = value.offset();
+        let null_count = value.null_count();
+        let buffers = value.buffers()?;
+        let null_bit_buffer = value.null_bit_buffer();
+
+        // todo: no child data yet...
+        Ok(ArrayData::new(
+            data_type,
+            len,
+            Some(null_count),
+            null_bit_buffer,
+            offset,
+            buffers,
+            vec![],
+        ))
+    }
+}
+
+impl TryFrom<ArrayData> for ffi::ArrowArray {
+    type Error = ArrowError;
+
+    fn try_from(value: ArrayData) -> Result<Self> {
+        let len = value.len();
+        let offset = value.offset() as usize;
+        let null_count = value.null_count();
+        let buffers = value.buffers().to_vec();
+        let null_buffer = value.null_buffer().cloned();
+
+        // todo: no child data yet...
+        unsafe {
+            ffi::ArrowArray::try_new(
+                value.data_type(),
+                len,
+                null_count,
+                null_buffer,
+                offset,
+                buffers,
+                vec![],
+            )
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use crate::error::Result;
+    use crate::{
+        array::{Array, ArrayData, Int64Array, UInt32Array, UInt64Array},
+        ffi::ArrowArray,
+    };
+    use std::convert::TryFrom;
+
+    fn test_round_trip(expected: &ArrayData) -> Result<()> {
+        // create a `ArrowArray` from the data.
+        let d1 = ArrowArray::try_from(expected.clone())?;
+
+        // here we export the array as 2 pointers. We would have no control 
over ownership if it was not for
+        // the release mechanism.
+        let (array, schema) = ArrowArray::into_raw(d1);
+
+        // simulate an external consumer by being the consumer
+        let d1 = unsafe { ArrowArray::try_from_raw(array, schema) }?;
+
+        let result = &ArrayData::try_from(d1)?;
+
+        assert_eq!(result, expected);
+        Ok(())
+    }
+
+    #[test]
+    fn test_u32() -> Result<()> {
+        let data = UInt32Array::from(vec![2]).data();
+        test_round_trip(data.as_ref())
+    }
+
+    #[test]
+    fn test_u64() -> Result<()> {
+        let data = UInt64Array::from(vec![2]).data();
+        test_round_trip(data.as_ref())
+    }
+
+    #[test]
+    fn test_i64() -> Result<()> {
+        let data = Int64Array::from(vec![2]).data();

Review comment:
       I suggest at least 2 elements (and maybe a `None` in each of these 
arrays)

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -151,70 +80,52 @@ impl Buffer {
     ///
     /// * `ptr` - Pointer to raw parts
     /// * `len` - Length of raw parts in **bytes**
-    /// * `capacity` - Total allocated memory for the pointer `ptr`, in 
**bytes**
+    /// * `data` - An [ffi::FFI_ArrowArray] with the data
     ///
     /// # Safety
     ///
     /// This function is unsafe as there is no guarantee that the given 
pointer is valid for `len`
-    /// bytes. If the `ptr` and `capacity` come from a `Buffer`, then this is 
guaranteed.
-    pub unsafe fn from_unowned(ptr: *const u8, len: usize, capacity: usize) -> 
Self {
-        Buffer::build_with_arguments(ptr, len, capacity, false)
+    /// bytes and that the foreign deallocator frees the region.
+    pub unsafe fn from_unowned(
+        ptr: *const u8,
+        len: usize,
+        data: Arc<ffi::FFI_ArrowArray>,
+    ) -> Self {
+        Buffer::build_with_arguments(ptr, len, Deallocation::Foreign(data))
     }
 
-    /// Creates a buffer from an existing memory region (must already be 
byte-aligned).
-    ///
-    /// # Arguments
-    ///
-    /// * `ptr` - Pointer to raw parts
-    /// * `len` - Length of raw parts in bytes
-    /// * `capacity` - Total allocated memory for the pointer `ptr`, in 
**bytes**
-    /// * `owned` - Whether the raw parts is owned by this `Buffer`. If true, 
this `Buffer` will
-    /// free this memory when dropped, otherwise it will skip freeing the raw 
parts.
-    ///
-    /// # Safety
-    ///
-    /// This function is unsafe as there is no guarantee that the given 
pointer is valid for `len`
-    /// bytes. If the `ptr` and `capacity` come from a `Buffer`, then this is 
guaranteed.
+    /// Auxiliary method to create a new Buffer
     unsafe fn build_with_arguments(
         ptr: *const u8,
         len: usize,
-        capacity: usize,
-        owned: bool,
+        deallocation: Deallocation,
     ) -> Self {
-        assert!(
-            memory::is_aligned(ptr, memory::ALIGNMENT),

Review comment:
       I wonder why we dropped the alignment assertion

##########
File path: rust/arrow/src/bytes.rs
##########
@@ -0,0 +1,166 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This module contains an implementation of a contiguous immutable memory 
region that knows
+//! how to de-allocate itself, [`Bytes`].
+//! Note that this is a low-level functionality of this crate.
+
+use core::slice;
+use std::sync::Arc;
+use std::{fmt::Debug, fmt::Formatter};
+
+use crate::{ffi, memory};
+
+/// Mode of deallocating memory regions
+pub enum Deallocation {
+    /// Native deallocation, using Rust deallocator with Arrow-specific memory 
aligment
+    Native(usize),
+    /// Foreign interface, via a callback
+    Foreign(Arc<ffi::FFI_ArrowArray>),
+}
+
+impl Debug for Deallocation {
+    fn fmt(&self, f: &mut Formatter) -> std::fmt::Result {
+        match self {
+            Deallocation::Native(capacity) => {
+                write!(f, "Deallocation::Native {{ capacity: {} }}", capacity)
+            }
+            Deallocation::Foreign(_) => {
+                write!(f, "Deallocation::Foreign {{ capacity: unknown }}")
+            }
+        }
+    }
+}
+
+/// A continuous, fixed-size, immutable memory region that knows how to 
de-allocate itself.
+/// This structs' API is inspired by the `bytes::Bytes`, but it is not limited 
to using rust's
+/// global allocator nor u8 aligmnent.
+///
+/// In the most common case, this buffer is allocated using 
[`allocate_aligned`](memory::allocate_aligned)
+/// and deallocated accordingly [`free_aligned`](memory::free_aligned).
+/// When the region is allocated by an foreign allocator, 
[Deallocation::Foreign], this calls the
+/// foreign deallocator to deallocate the region when it is no longer needed.
+pub struct Bytes {
+    /// The raw pointer to be begining of the region
+    ptr: *const u8,
+
+    /// The number of bytes visible to this region. This is always smaller 
than its capacity (when avaliable).
+    len: usize,
+
+    /// how to deallocate this region
+    deallocation: Deallocation,
+}
+
+impl Bytes {
+    /// Takes ownership of an allocated memory region,
+    ///
+    /// # Arguments
+    ///
+    /// * `ptr` - Pointer to raw parts
+    /// * `len` - Length of raw parts in **bytes**
+    /// * `capacity` - Total allocated memory for the pointer `ptr`, in 
**bytes**
+    ///
+    /// # Safety
+    ///
+    /// This function is unsafe as there is no guarantee that the given 
pointer is valid for `len`
+    /// bytes. If the `ptr` and `capacity` come from a `Buffer`, then this is 
guaranteed.
+    pub unsafe fn new(ptr: *const u8, len: usize, deallocation: Deallocation) 
-> Bytes {
+        Bytes {
+            ptr,
+            len,
+            deallocation,
+        }
+    }
+
+    #[inline]
+    pub fn as_slice(&self) -> &[u8] {
+        unsafe { slice::from_raw_parts(self.ptr, self.len) }
+    }
+
+    #[inline]
+    pub fn len(&self) -> usize {
+        self.len
+    }
+
+    #[inline]
+    pub fn is_empty(&self) -> bool {
+        self.len == 0
+    }
+
+    #[inline]
+    pub fn raw_data(&self) -> *const u8 {
+        self.ptr
+    }
+
+    #[inline]
+    pub fn raw_data_mut(&mut self) -> *mut u8 {
+        self.ptr as *mut u8
+    }
+
+    pub fn capacity(&self) -> usize {
+        match self.deallocation {
+            Deallocation::Native(capacity) => capacity,
+            // we cannot determine this in general,
+            // and thus we state that this is externally-owned memory
+            Deallocation::Foreign(_) => 0,
+        }
+    }
+}
+
+impl Drop for Bytes {
+    #[inline]
+    fn drop(&mut self) {
+        match &self.deallocation {
+            Deallocation::Native(capacity) => {
+                if !self.ptr.is_null() {
+                    unsafe { memory::free_aligned(self.ptr as *mut u8, 
*capacity) };
+                }
+            }
+            // foreign interface knows how to deallocate itself.
+            Deallocation::Foreign(_) => (),

Review comment:
       Is the idea that 
`https://github.com/apache/arrow/pull/8401/files#diff-539f116862a6cea16ae65b6a031927a23fb3da6a1ee0223d517215fc83bf4a7aR157`
 gets invoked once all `Arc`s to this memory get dropped? It makes sense to me, 
I just want to double check if I got the thinking right

##########
File path: rust/arrow/src/ffi.rs
##########
@@ -0,0 +1,657 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Contains declarations to bind to the [C Data 
Interface](https://arrow.apache.org/docs/format/CDataInterface.html).
+//!
+//! Generally, this module is divided in two main interfaces:
+//! One interface maps C ABI to native Rust types, i.e. convert c-pointers, 
c_char, to native rust.
+//! This is handled by [FFI_ArrowSchema] and [FFI_ArrowArray].
+//!
+//! The second interface maps native Rust types to the Rust-specific 
implementation of Arrow such as `format` to [Datatype],
+//! `Buffer`, etc. This is handled by [ArrowArray].
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! # use arrow::array::{Int32Array, Array, ArrayData, make_array_from_raw};
+//! # use arrow::error::{Result, ArrowError};
+//! # use arrow::compute::kernels::arithmetic;
+//! # use std::convert::TryFrom;
+//! # fn main() -> Result<()> {
+//! // create an array natively
+//! let array = Int32Array::from(vec![Some(1), None, Some(3)]);
+//!
+//! // export it
+//! let (array_ptr, schema_ptr) = array.to_raw()?;
+//!
+//! // consumed and used by something else...
+//!
+//! // import it
+//! let array = unsafe { make_array_from_raw(array_ptr, schema_ptr)? };
+//!
+//! // perform some operation
+//! let array = array.as_any().downcast_ref::<Int32Array>().ok_or(
+//!     ArrowError::ParseError("Expects an int32".to_string()),
+//! )?;
+//! let array = arithmetic::add(&array, &array)?;
+//!
+//! // verify
+//! assert_eq!(array, Int32Array::from(vec![Some(2), None, Some(6)]));
+//!
+//! // (drop/release)
+//! Ok(())
+//! }
+//! ```
+
+/*
+# Design:
+
+Main assumptions:
+* A memory region is deallocated according it its own release mechanism.
+* Rust shares memory regions between arrays.
+* A memory region should be deallocated when no-one is using it.
+
+The design of this module is as follows:
+
+`ArrowArray` contains two `Arc`s, one per ABI-compatible `struct`, each 
containing data
+according to the C Data Interface. These Arcs are used for ref counting of the 
structs
+within Rust and lifetime management.
+
+Each ABI-compatible `struct` knowns how to `drop` itself, calling `release`.
+
+To import an array, unsafely create an `ArrowArray` from two pointers using 
[ArrowArray::try_from_raw].
+To export an array, create an `ArrowArray` using [ArrowArray::try_new].
+*/
+
+use std::{ffi::CStr, ffi::CString, iter, mem::size_of, ptr, sync::Arc};
+
+use crate::buffer::Buffer;
+use crate::datatypes::DataType;
+use crate::error::{ArrowError, Result};
+use crate::util::bit_util;
+
+/// ABI-compatible struct for `ArrowSchema` from C Data Interface
+/// See 
https://arrow.apache.org/docs/format/CDataInterface.html#structure-definitions
+/// This was created by bindgen
+#[repr(C)]
+#[derive(Debug)]
+pub struct FFI_ArrowSchema {
+    format: *const ::std::os::raw::c_char,
+    name: *const ::std::os::raw::c_char,
+    metadata: *const ::std::os::raw::c_char,
+    flags: i64,
+    n_children: i64,
+    children: *mut *mut FFI_ArrowSchema,
+    dictionary: *mut FFI_ArrowSchema,
+    release: ::std::option::Option<unsafe extern "C" fn(arg1: *mut 
FFI_ArrowSchema)>,
+    private_data: *mut ::std::os::raw::c_void,
+}
+
+// callback used to drop [FFI_ArrowSchema] when it is exported.
+unsafe extern "C" fn release_schema(schema: *mut FFI_ArrowSchema) {
+    let schema = &mut *schema;
+
+    // take ownership back to release it.
+    CString::from_raw(schema.format as *mut std::os::raw::c_char);
+
+    schema.release = None;
+}
+
+impl FFI_ArrowSchema {
+    /// create a new [FFI_ArrowSchema] from a format.
+    fn new(format: &str) -> FFI_ArrowSchema {
+        // 
https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema
+        FFI_ArrowSchema {
+            format: CString::new(format).unwrap().into_raw(),
+            name: std::ptr::null_mut(),
+            metadata: std::ptr::null_mut(),
+            flags: 0,
+            n_children: 0,
+            children: ptr::null_mut(),
+            dictionary: std::ptr::null_mut(),
+            release: Some(release_schema),
+            private_data: std::ptr::null_mut(),
+        }
+    }
+
+    /// create an empty [FFI_ArrowSchema]
+    fn empty() -> Self {
+        Self {
+            format: std::ptr::null_mut(),
+            name: std::ptr::null_mut(),
+            metadata: std::ptr::null_mut(),
+            flags: 0,
+            n_children: 0,
+            children: ptr::null_mut(),
+            dictionary: std::ptr::null_mut(),
+            release: None,
+            private_data: std::ptr::null_mut(),
+        }
+    }
+
+    /// returns the format of this schema.
+    pub fn format(&self) -> &str {
+        unsafe { CStr::from_ptr(self.format) }
+            .to_str()
+            .expect("The external API has a non-utf8 as format")
+    }
+}
+
+impl Drop for FFI_ArrowSchema {
+    fn drop(&mut self) {
+        match self.release {
+            None => (),
+            Some(release) => unsafe { release(self) },
+        };
+    }
+}
+
+/// maps a DataType `format` to a [DataType](arrow::datatypes::DataType).
+/// See 
https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings
+fn to_datatype(format: &str) -> Result<DataType> {
+    Ok(match format {
+        "n" => DataType::Null,
+        "b" => DataType::Boolean,
+        "c" => DataType::Int8,
+        "C" => DataType::UInt8,
+        "s" => DataType::Int16,
+        "S" => DataType::UInt16,
+        "i" => DataType::Int32,
+        "I" => DataType::UInt32,
+        "l" => DataType::Int64,
+        "L" => DataType::UInt64,
+        "e" => DataType::Float16,
+        "f" => DataType::Float32,
+        "g" => DataType::Float64,
+        "z" => DataType::Binary,
+        "Z" => DataType::LargeBinary,
+        "u" => DataType::Utf8,
+        "U" => DataType::LargeUtf8,
+        _ => {
+            return Err(ArrowError::CDataInterface(
+                "The datatype \"{}\" is still not supported in Rust 
implementation"
+                    .to_string(),
+            ))
+        }
+    })
+}
+
+/// the inverse of [to_datatype]
+fn from_datatype(datatype: &DataType) -> Result<String> {
+    Ok(match datatype {
+        DataType::Null => "n",
+        DataType::Boolean => "b",
+        DataType::Int8 => "c",
+        DataType::UInt8 => "C",
+        DataType::Int16 => "s",
+        DataType::UInt16 => "S",
+        DataType::Int32 => "i",
+        DataType::UInt32 => "I",
+        DataType::Int64 => "l",
+        DataType::UInt64 => "L",
+        DataType::Float16 => "e",
+        DataType::Float32 => "f",
+        DataType::Float64 => "g",
+        DataType::Binary => "z",
+        DataType::LargeBinary => "Z",
+        DataType::Utf8 => "u",
+        DataType::LargeUtf8 => "U",
+        _ => {
+            return Err(ArrowError::CDataInterface(
+                "The datatype \"{:?}\" is still not supported in Rust 
implementation"
+                    .to_string(),
+            ))
+        }
+    }
+    .to_string())
+}
+
+// returns the number of bits that buffer `i` (in the C data interface) is 
expected to have.
+// This is set by the Arrow specification
+fn bit_width(data_type: &DataType, i: usize) -> Result<usize> {
+    Ok(match (data_type, i) {
+        // the null buffer is bit sized
+        (_, 0) => 1,
+        // primitive types first buffer's size is given by the native types
+        (DataType::Boolean, 1) => 1,
+        (DataType::UInt8, 1) => size_of::<u8>() * 8,
+        (DataType::UInt16, 1) => size_of::<u16>() * 8,
+        (DataType::UInt32, 1) => size_of::<u32>() * 8,
+        (DataType::UInt64, 1) => size_of::<u64>() * 8,
+        (DataType::Int8, 1) => size_of::<i8>() * 8,
+        (DataType::Int16, 1) => size_of::<i16>() * 8,
+        (DataType::Int32, 1) => size_of::<i32>() * 8,
+        (DataType::Int64, 1) => size_of::<i64>() * 8,
+        (DataType::Float32, 1) => size_of::<f32>() * 8,
+        (DataType::Float64, 1) => size_of::<f64>() * 8,
+        // primitive types have a single buffer
+        (DataType::Boolean, _) |
+        (DataType::UInt8, _) |
+        (DataType::UInt16, _) |
+        (DataType::UInt32, _) |
+        (DataType::UInt64, _) |
+        (DataType::Int8, _) |
+        (DataType::Int16, _) |
+        (DataType::Int32, _) |
+        (DataType::Int64, _) |
+        (DataType::Float32, _) |
+        (DataType::Float64, _) => {
+            return Err(ArrowError::CDataInterface(format!(
+                "The datatype \"{:?}\" expects 2 buffers, but requested {}. 
Please verify that the C data interface is correctly implemented.",
+                data_type, i
+            )))
+        }
+        // Variable-sized binaries: have two buffers.
+        // Utf8: first buffer is i32, second is in bytes
+        (DataType::Utf8, 1) => size_of::<i32>() * 8,
+        (DataType::Utf8, 2) => size_of::<u8>() * 8,
+        (DataType::Utf8, _) => {
+            return Err(ArrowError::CDataInterface(format!(
+                "The datatype \"{:?}\" expects 3 buffers, but requested {}. 
Please verify that the C data interface is correctly implemented.",

Review comment:
       this error seems off - it seems like the code is expecting 1 or 2 
buffers, but the error says it is expecting 3

##########
File path: rust/arrow-c-integration/tests/test_sql.py
##########
@@ -0,0 +1,61 @@
+# -*- coding: utf-8 -*-
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import unittest
+
+import pyarrow
+import arrow_c_integration
+
+
+class TestCase(unittest.TestCase):
+    def test_primitive_python(self):
+        """
+        Python -> Rust -> Python
+        """
+        old_allocated = pyarrow.total_allocated_bytes()
+        a = pyarrow.array([1, 2, 3])
+        b = arrow_c_integration.double(a)
+        self.assertEqual(b, pyarrow.array([2, 4, 6]))

Review comment:
       nice




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to