frmnboi commented on issue #10488:
URL: https://github.com/apache/arrow/issues/10488#issuecomment-860285563


   Thank you everyone for helping me out with this example.  I think the issues 
largely were twofold and largely answered by Maarten and Wes.
   
   These issues were:
   
   1. I was not linking libarrow_python, so the linker was not able to find the 
pyarrow symbols.  On my device, I use the shared library 
**libarrow_python.so.400**  
   2. I had not #included **caster.hpp** from the VAEX repo, so Pybind11 did 
not have the type conversions required to handle arrow data between Python and 
C++.
   
   However, I now seem to be running into an odd error where the critical 
**arrow::py::import_pyarrow()** is throwing a segmentation fault.  The behavior 
is a bit strange and varies between situations.  
   
   To debug, the modified code now looks like this:
   
   ```
   #include <pybind11/pybind11.h>
   #include <Python.h>
   #include <arrow/python/pyarrow.h>
   #include <arrow/array/builder_primitive.h>
   #include "caster.hpp"
   
   #include<iostream>
   
   std::shared_ptr<arrow::DoubleArray> 
vol_adj_close(std::shared_ptr<arrow::DoubleArray>& 
close,std::shared_ptr<arrow::Int64Array>& volume)
   {
       std::cout<<"arrow function called"<<std::endl;
       if (close->length()!=volume->length())
           throw std::length_error("Arrays are not of equal length");
       std::cout<<"length check passed"<<std::endl;
       arrow::DoubleBuilder builder;
       arrow::Status status = builder.Resize(close->length());
       if (!status.ok()) {
           throw std::bad_alloc();
       }
       std::cout<<"resize called"<<std::endl;
       for(int i = 0; i < volume->length(); i++) {
           builder.UnsafeAppend(close->Value(i) / volume->Value(i));
       }
       std::cout<<"appended data (via unsafe call)"<<std::endl;
       std::shared_ptr<arrow::DoubleArray> array;
       arrow::Status st = builder.Finish(&array);
       if (!status.ok()) {
           throw std::bad_alloc();
       }
       std::cout<<"returning array"<<std::endl;
       return array;
   }
   
   int import_pyarrow()
   {
       return arrow::py::import_pyarrow();
   }
   
   
   PYBIND11_MODULE(helperfuncs, m) {
       // arrow::py::import_pyarrow();
       m.doc() = "Pyarrow Extensions";
       m.def("vol_adj_close", &vol_adj_close, 
pybind11::call_guard<pybind11::gil_scoped_release>());
       m.def("import_pyarrow",&import_pyarrow);
       m.def("import_pyarrow2",&arrow::py::import_pyarrow);
   }
   ```
   
   
   When the compiled **helperfuncs.cpp** extension is imported into the python 
file I want to use it in while debugging, it imports without any problems, and 
I can even call the **arrow::py::import_pyarrow()** or 
**arrow::py::import_pyarrow2()** function, and both successfully return 0.  
However, it it throws a segmentation fault while performing the first 
**UnsafeAppend**.
   
   It behaves a bit differently when I import it in a standalone python 
interpreter shell.  It leads to 2 different memory-related errors depending on 
how its imported:
   
     
   ```
   import helperfuncs
   helperfuncs.import_pyarrow() #call to the "wrapped" import_pyarrow()
   
   **Segmentation fault (core dumped)**
   ```
   
   or 
   
   ```
   import helperfuncs
   helperfuncs.import_pyarrow2() #call to the "unwrapped" import_pyarrow()
   
   **free(): double free detected in tcache 2
   Aborted (core dumped)**
   ```
   
   uncommenting the **arrow::py::import_pyarrow();** line in the 
**PYBIND11_MODULE** function, which was how the code was in the VAEX 
repository, will also fail, but with a Segmentation fault during import.  
   
   
   Does anyone know why pyarrow can **import_pyarrow()** in the python file, 
but not from a raw python interpreter shell, and am I missing something that is 
causing segmentation faults?  Is the **libarrow_python.so.400**  library the 
appropriate pyarrow library to link?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to