ZS, 28.02.2013 19:31: > 2013/2/28 ZS: >> Looking into IndexNode class in ExprNode.py I have seen a possibility >> for addition of more fast code path for unicode[index] as it done in >> method `generate_setitem_code` in case of lists. >> >> This is files for evaluation of performance difference: >> >> #### unicode_index.h >> >> /* This is striped version of __Pyx_GetItemInt_Unicode_Fast */ >> #include "unicodeobject.h" >> >> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i); >> >> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) { >> #if CYTHON_PEP393_ENABLED >> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; >> #endif >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> }
Sure, looks ok. >> ##### unicode_index.pyx >> >> # coding: utf-8 >> >> cdef extern from 'unicode_index.h': >> inline Py_UCS4 unicode_char(unicode ustring, int i) >> >> cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz" >> >> def f_1(unicode text): >> cdef int i, j >> cdef int n = len(text) >> cdef Py_UCS4 ch >> >> for j from 0<=j<=1000000: Personally, I find a range() loop much easier to read than this beast. >> for i from 0<=i<=n-1: >> ch = text[i] >> >> def f_2(unicode text): >> cdef int i, j >> cdef int n = len(text) >> cdef Py_UCS4 ch >> >> for j from 0<=j<=1000000: >> for i from 0<=i<=n-1: >> ch = unicode_char(text, i) >> >> def test_1(): >> f_1(text) >> >> def test_2(): >> f_2(text) >> >> Timing results: >> >> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from >> mytests.unicode_index import test_1" "test_1()" >> 100 loops, best of 10: 89 msec per loop >> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from >> mytests.unicode_index import test_2" "test_2()" >> 100 loops, best of 10: 46.1 msec per loop I seriously doubt that this translates to similar results in real-world code. In the second example above, the C compiler should be able to remove a lot of code, certainly including the useless character read. Maybe even the loops, if it can determine that PyUnicode_READY() will always return the same result. So you're almost certainly not benchmarking what you think you are. >> in setup.py globally: >> >> "boundscheck": False >> "wraparound": False >> "nonecheck": False >> > For the sake of clarity I would like to add the following... This > optimization is for the case when both `boundscheck(False)` and > `wraparound(False)` is applied. Otherwise default path of evaluation > (__Pyx_GetItemInt_Unicode) is applied. > > This allows to write unicode text parsing code almost at C speed > mostly in python (+ .pxd defintions). I suggest simply adding a constant flag argument to the existing function that states if checking should be done or not. Inlining will let the C compiler drop the corresponding code, which may or may nor make it a little faster. Stefan _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel