frmnboi opened a new issue #10488: URL: https://github.com/apache/arrow/issues/10488
I'm trying to write a C++ extension to add a new column to a table I have. I create the table with pyarrow in python, but I want to call a function in C++ to operate on the data, in-place if possible. Currently, I have this: **helperfuncs.cpp** ``` #include <pybind11/pybind11.h> #include <Python.h> #include <arrow/python/pyarrow.h> #include <arrow/array/builder_primitive.h> //arrow::Array; //arrow::ChunkedArray; std::shared_ptr<arrow::DoubleArray> vol_adj_close(std::shared_ptr<arrow::DoubleArray>& close,std::shared_ptr<arrow::Int64Array>& volume) { // auto close=std::static_pointer_cast<arrow::DoubleArray>(closeraw); // auto volume=std::static_pointer_cast<arrow::DoubleArray>(volumeraw); if (close->length()!=volume->length()) throw std::length_error("Arrays are not of equal length"); arrow::DoubleBuilder builder; arrow::Status status = builder.Resize(close->length()); if (!status.ok()) { throw std::bad_alloc(); } for(int i = 0; i < volume->length(); i++) { builder.UnsafeAppend(close->Value(i) / volume->Value(i)); } std::shared_ptr<arrow::DoubleArray> array; arrow::Status st = builder.Finish(&array); if (!status.ok()) { throw std::bad_alloc(); } return array; } // int import_pyarrow() // { // return arrow::py::import_pyarrow(); // } PYBIND11_MODULE(helperfuncs, m) { //arrow::py::import_pyarrow(); m.doc() = "Pyarrow Extensions"; m.def("vol_adj_close", &vol_adj_close, pybind11::call_guard<pybind11::gil_scoped_release>()); //m.def("import_pyarrow",&import_pyarrow); } ``` This was taken from the one example I could find on Pybind11 and Pyarrow working together: [https://github.com/vaexio/vaex-arrow-ext](https://github.com/vaexio/vaex-arrow-ext) I compile this using Cmake with the following excerpt: **Cmake** ``` find_package(PythonInterp REQUIRED) include_directories(${PYTHON_INCLUDE_DIRS}) add_subdirectory(.../pybind11/) pybind11_add_module(helperfuncs helperfuncs.cpp MODULE) add_compile_options(-std=c++20 -O2 -shared -fPIC) find_package(Arrow REQUIRED) target_include_directories(helperfuncs PUBLIC .../Python_venv_3.8.5/lib/python3.8/site-packages/pyarrow/include) target_link_libraries(helperfuncs PRIVATE arrow_shared) #arrow_static ``` and call it in python with the following excerpt: **test.py** ``` import helperfuncs voladjclose=helperfuncs.vol_adj_close(data['close'].combine_chunks(),data['volume'].combine_chunks()) ``` where the unchunked **data['close']** is a pyarrow.lib.DoubleArray object and unchunked **data['volume']** is a pyarrow.lib.Int64Array object. Using cmake, this code will compile to a shared library, and can be successfully imported into python as the **helperfuncs** module. However, there are 2 issues that arise: 1. The commented lines in the PYBIND11_MODULE function were failed attempts at running the **import_pyarrow** function required for C++ extensions. I assume this is a C++ object, as there does seem to be a **py::import_pyarrow()** function, but imports into python fail due to a linker issue: **ImportError:. ../helperfuncs.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN5arrow2py14import_pyarrowEv** 2. Trying to run vol_adj_close() raw on the dataseries as seen in the python snippet gives the following type error: **vol_adj_close(): incompatible function arguments. The following argument types are supported: 1. (arg0: arrow::NumericArray<arrow::DoubleType>, arg1: arrow::NumericArray<arrow::Int64Type>) -> arrow::NumericArray<arrow::DoubleType>** This one confuses me greatly, as from what I can see from the documentation and code testing is: pa.Array ----------------> <class 'pyarrow.lib.Array'> pa.NumericArray -----> <class 'pyarrow.lib.NumericArray'> The [documentation](https://arrow.apache.org/docs/python/generated/pyarrow.NumericArray.html?highlight=numericarray#pyarrow.NumericArray) seems to indicate that a NumericArray is a specific type of Array, so an implicit conversion should not be causing an issue. I do not see any way to convert an Array to NumericArray or vice versa in the documentation otherwise. Is there a difference between python's **pyarrow.lib.DoubleArray** and C++'s **arrow::NumericArray<arrow::DoubleType>** ? On a final note, I know that there is a division operation that pyarrow can use to perform element-wise division, like I have here for this problem, but I am trying in this case to see if I can get a C++ extension up and running for more complex problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org