This is an automated email from the ASF dual-hosted git repository.

chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fury.git


The following commit(s) were added to refs/heads/main by this push:
     new 3865dcd0 perf(python): Directly access the key-value pairs of a dict 
(#1970)
3865dcd0 is described below

commit 3865dcd0982c0ca9de04a8ba9635892b76288769
Author: penguin_wwy <[email protected]>
AuthorDate: Sun Dec 8 00:09:32 2024 +0800

    perf(python): Directly access the key-value pairs of a dict (#1970)
    
    ## What does this PR do?
    
    In Python, to implement a linear memory structure that stores key-value
    pairs, we can traverse them in the order of insertion like accessing an
    array. However, Cython does not provide a direct access interface, and
    these interfaces are internal in CPython, requiring compatibility work
    to use them correctly. Nevertheless, we can still use the`PyDict_Next`
    interface to replace the `items` method. Essentially, `items` use
    `PyDict_Next` to append to a list. Doing so can reduce the copying
    overhead.
    
    ## Related issues
    
    ## Does this PR introduce any user-facing change?
    
    - [ ] Does this PR introduce any public API change?
    - [ ] Does this PR introduce any binary protocol compatibility change?
    
    ## Benchmark
    For large dict
    ```
    [dict_item] 541 us +- 39 us -> [dict_next]  535 us +- 35 us: 1.00x faster
    
    [dict_item] 119.8 MiB +- 1344.0 KiB -> [dict_next] 118.8 MiB +- 1338.4 KiB: 
1.01x faster
    ```
---
 python/pyfury/_serialization.pyx | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/python/pyfury/_serialization.pyx b/python/pyfury/_serialization.pyx
index ce1443c6..74bd755b 100644
--- a/python/pyfury/_serialization.pyx
+++ b/python/pyfury/_serialization.pyx
@@ -44,6 +44,7 @@ from pyfury.util import is_little_endian
 from libc.stdint cimport *
 from libcpp.vector cimport vector
 from cpython cimport PyObject
+from cpython.dict cimport PyDict_Next
 from cpython.ref cimport *
 from cpython.list cimport PyList_New, PyList_SET_ITEM
 from cpython.tuple cimport PyTuple_New, PyTuple_SET_ITEM
@@ -2049,7 +2050,13 @@ cdef class MapSerializer(Serializer):
         buffer.write_varint32(len(value))
         cdef ClassInfo key_classinfo
         cdef ClassInfo value_classinfo
-        for k, v in value.items():
+        cdef int64_t key_addr, value_addr
+        cdef Py_ssize_t pos = 0
+        while PyDict_Next(value, &pos, <PyObject **>&key_addr, <PyObject 
**>&value_addr) != 0:
+            k = int2obj(key_addr)
+            Py_INCREF(k)
+            v = int2obj(value_addr)
+            Py_INCREF(v)
             key_cls = type(k)
             if key_cls is str:
                 buffer.write_int16(NOT_NULL_STRING_FLAG)
@@ -2122,7 +2129,13 @@ cdef class MapSerializer(Serializer):
     cpdef inline xwrite(self, Buffer buffer, o):
         cdef dict value = o
         buffer.write_varint32(len(value))
-        for k, v in value.items():
+        cdef int64_t key_addr, value_addr
+        cdef Py_ssize_t pos = 0
+        while PyDict_Next(value, &pos, <PyObject **>&key_addr, <PyObject 
**>&value_addr) != 0:
+            k = int2obj(key_addr)
+            Py_INCREF(k)
+            v = int2obj(value_addr)
+            Py_INCREF(v)
             self.fury.xserialize_ref(
                 buffer, k, serializer=self.key_serializer
             )


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to