Matthew Honnibal wrote:
> Hi,
> I've only just started using Cython today, and I'm having trouble with
> the buffer interface indexing described here:
> http://wiki.cython.org/enhancements/buffer . I want to iterate over a
> unicode string getting contiguous subsequences.
>   
The buffer PEP is *available* in Python 2.6, however I don't think 
objects in the Python standard library exports its buffers using it. 
Unfortunately.

What you can try to do is use the backwards-compatability mechanisms of 
implementing __getbuffer__ in Cython, something like (untested):

from python_unicode cimport Py_UNICODE

cdef extern from "Python.h": # Or Python's unicodeobject.h
    ctypedef class unicode [object PyUnicodeObject]:
        Py_ssize_t length
        Py_UNICODE *str
        def __getbuffer__(self, Py_buffer* buf, int flags):
            ... fill in buf struct with PEP 3118 information to export 
self.str/self.length ...

Notes:
a) If you only want to deal with unicodes, you can probably just as well 
drop __getbuffer__. With the declaration above, you can still do

cdef unicode u = myunicode
cdef Py_UNICODE *buf = u.str
print buf[3] # gets 4th unicode character

without any buffer support.

b) If you do write up a decent unicode declaration, make sure to 
contribute it to Cython's Cython/Includes/python_unicode.

c) If you go the __getbuffer__ route for more convenient syntax, be 
aware that unicode types are not supported; you need to export it as 
"=I", "=H", "=B" (int, short, byte) depending on sizeof(Py_UNICODE), see 
struct module, and then acquire a buffer through

cdef unicode[Py_UNICODE] u = myunicode
cdef Py_UNICODE onechar = u[3]

Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to