penguin-wwy commented on issue #1887:
URL: https://github.com/apache/fury/issues/1887#issuecomment-2528755127

   I have gained new insights into this issue:
   
   I found that pybind is not always suitable for pyfury. In other projects 
where pybind is used, it is often employed as a wrapper layer to provide 
interfaces and data transmission, with the core content being unrelated to the 
Python Object structure.
   
   However, for pyfury, it is inherently closely related to the Python Object 
structure. More precise parsing of the Python Object structure means more 
opportunities for optimization to reduce overhead.
   
   For example, consider the python dict type. In the cpython dict structure, a 
linear memory stores indices and key-value pairs. By accessing this structure, 
we can achieve efficient traversal of the dict, similar to arrays and lists. 
However, accessing this memory is internal in CPython, and the access method 
varies across different versions.
   
   ```
          +----------------------------------------------------
          |  dk_indices | PyDictKeyEntry | PyDictKeyEntry | ...
          +----------------------------------------------------
         ^                    
          |
   dict->ma_keys->dk_indices
   ```
   
   Therefore, we need a general Python object access code and high-performance 
access code tailored to specific types and versions.
   
   * For general Python objects, Cython is more suitable than pybind. For 
example, in the current list or set serialization protocol code, if we use 
pybind, we need to manually call the C-API and handle error processing, 
reference counting, and version differences, which may be more difficult to 
maintain than using Cython.
   
   * For high-performance specialized objects, there is no difference between 
using Cython or pybind, as it essentially involves manually parsing PyObject.
   
   Therefore, my idea is that we need to use Cython more lightly, using Cython 
to write general serialization protocols, and transferring the specialized 
parts to C++.
   
   ```
   // cpp
   void _write_pydict_to_buffer(PyDictObject *dict, Buffer buffer) {
   #if PYTHON_VERSION
      PyDictKeyEntry *entries = dict->ma_keys->dk_indices[...]
   #elif PYTHON_VERSION
       PyDict_NEXT(...)
   #endif
   }
   
   # cython
   class SetSerializer:
       ...
   
   class ListSerializer:
       ...
   
   class MapSerializer:
       cpdef inline write(self, Buffer buffer, o):
           _write_pydict_to_buffer(<PyDictObject *>o, buffer->c_ptr())
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to