Hi,

Aaron DeVore wrote:
> cdef void append(UnicodeBuffer *ubuffer, unicode un):
>     Py_UNICODE *string = PyUnicode_AsUnicode(un)
>     ... append string to buffer ...
>
> I'm trying to find something where I could also append with a function
> like this one:
> 
> cdef void appendArray(UnicodeBuffer *ubuffer, Py_UNICODE *string, int length):
>     ...append string to buffer...

Why? Just use the PyUnicode_AS_UNICODE(un) macro for accessing the Py_UNICODE
buffer, and PyUnicode_AS_DATA(un) to get the length, that avoids any redundant
type checks or conversions. That way, you can just keep your original function
without changing its signature.

I may be biased since I've been working on the lxml XML library for quite a
while now, but may I ask why you use unicode strings and Py_UNICODE
internally, instead of a UTF-8 encoded byte buffer?


> One possibility is having several arrays of commonly used unicode
> strings sitting around. In that case render() from above might look
> like this:
> 
> cdef void render(UnicodeBuffer *ubuffer, Tag tag):
> ----appendArray(buffer, ustring_lt, 1)
> ----append(buffer, tag.name)
> ----...render attributes with more append calls...
> ----append(buffer, ustring_gt, 1)
> 
> What would be the best way to go about this?

Note that both unicode and byte strings are interned by Cython, so I'd just
write u"<" and u">", which I find the most readable.

The less short-term solution would actually be to make Py_UNICODE a known
numeric type in Cython, and to do the conversion on the fly, as in

        cdef unicode u = u"some text"
        cdef Py_UNICODE* buffer = u     # calls PyUnicode_AS_UNICODE(u)
        cdef Py_UNICODE  ch = u         # raise an exception as u is too long

That would be in line with the current byte string <-> char* conversion. (not
sure about the exception, BTW).

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to