I'm currently exploring using Cython to provide new Python 3 bindings for Xapian. I'm pretty much a Cython n00b but the documentation is great, and I was able to pretty quickly get something really simple working. I'm using Cython 0.15 on Ubuntu 12.04 with Python 3.2 and Xapian 1.2.12. I've pushed my current branch to github:
https://github.com/warsaw/xapian/tree/py3/xapian-bindings/python3 There you'll see my xapianlib.pxd and xapian.pyx files. Where I'm seeing some odd behavior is in trying to expose the Xapian::TermGenerator.get_description() method. This returns a std::string and I'm trying to create a `description` property that coerces this to unicode before returning it to Python. Here's the relevant code: -----snip snip----- cdef class TermGenerator: cdef xapianlib.TermGenerator * _this def __cinit__(self): self._this = new xapianlib.TermGenerator() def __dealloc__(self): del self._this property description: def __get__(self): as_bytes = <char *>self._this.get_description().c_str() #return as_bytes return as_bytes.decode('utf-8') -----snip snip----- I'm sure I'm doing something naive or stupid, but the problem is that as written above, .description is returning nonsense. % python Python 3.2.3 (default, May 3 2012, 15:51:42) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xapian >>> tg = xapian.TermGenerator() >>> tg.description '\x00\x00\x00\x00_\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' If instead, I return just the bytes object (i.e. what .get_description().c_str() returns), then I get more like what I expect. % python Python 3.2.3 (default, May 3 2012, 15:51:42) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xapian >>> tg = xapian.TermGenerator() >>> tg.description b'Xapian::TermGenerator(stem=Xapian::Stem(none), doc=Document(Xapian::Document::Internal()), termpos=0)' >>> tg.description.decode('utf-8') 'Xapian::TermGenerator(stem=Xapian::Stem(none), doc=Document(Xapian::Document::Internal()), termpos=0)' I looked at the generated code in the first example, but didn't really see anything obvious. There are no NULs in the char* description afaict. I haven't yet tested Cython 0.16 or 0.17 to see if this behaves differently. Is this a bug or am I doing something stupid? Cheers, -Barry
signature.asc
Description: PGP signature
_______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel