penguin-wwy commented on issue #1887:
URL: https://github.com/apache/fury/issues/1887#issuecomment-2431771329

   Hi, I conducted comparative experiments, trying out 
[pybind11](https://github.com/pybind/pybind11), 
[nanobind](https://github.com/wjakob/nanobind), and directly writing C-API code.
   
   - pybind11 had the worst performance, which aligns with my understanding. It 
doesn't perform any specific optimizations for different scenarios and has 
relatively complex type conversion operations. However, its maintenance code is 
the simplest. For code that only requires API binding, it can be written as 
follows:
   ```c++
   PYBIND11_MODULE(_pyutil, util_mod) {
     py::class_<fury::Buffer>(util_mod, "Buffer")
         .def(py::init<>())
         .def("own_data", &fury::Buffer::own_data)
         .def("reserve", &fury::Buffer::Reserve)
         .def("put_bool", [](fury::Buffer &self, uint32_t offset,
                             bool v) { self.UnsafePutByte(offset, v); })
         .def("put_int8", [](fury::Buffer &self, uint32_t offset,
                             int8_t v) { self.UnsafePutByte(offset, v); })
         .def("get_bool", &fury::Buffer::GetBool)
         .def("get_int8", &fury::Buffer::GetInt8)
         ...
         .def_static("allocate", [](uint32_t size) { return 
fury::AllocateBuffer(size); });
   }
   ```
   
   - Nanobind's performance is slightly better than Cython's, and its binding 
method is not much different from pybind11. However, it only supports Python 
3.8+.
   
   - Directly writing C-API code can perform better than Cython if optimized 
for different versions (especially >= 3.11). However, is detrimental to the 
goal of maintaining code more easily. For example:
     - Cython generates redundant checks when creating `get_bool`, and due to 
the unreasonable setting of `ml_flag` (it should choose `METH_O` instead of 
`METH_FASTCALL | METH_KEYWORDS`), parameter parsing also introduces additional 
overhead.
   ```c++
   static PyObject *
   cbuffer_get_bool(CBufferObject *self, PyObject *offset)
   {
       long off_val = PyLong_AsLong(offset);
       assert(off_val <= UINT32_MAX);
       return self->buffer->GetBool(off_val) ? Py_NewRef(Py_True) : 
Py_NewRef(Py_False);
   }
   
   static PyMethodDef cbuffer_methods[] = {
       {"get_bool", (PyCFunction)cbuffer_get_bool, METH_O, nullptr},
       ...
       {NULL, NULL}           /* sentinel */
   };
   ```
   
   Additionally, after analyzing the Cython code, I found that some performance 
optimizations can be achieved by directly calling certain C-API functions in 
the .pyx file. The principle behind this is to use some higher-level knowledge 
to avoid Cython generating certain guard code. I will attempt to submit these 
optimizations as a PR in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to