I'm currently exploring using Cython to provide new Python 3 bindings for
Xapian.  I'm pretty much a Cython n00b but the documentation is great, and I
was able to pretty quickly get something really simple working.  I'm using
Cython 0.15 on Ubuntu 12.04 with Python 3.2 and Xapian 1.2.12.  I've pushed my
current branch to github:

https://github.com/warsaw/xapian/tree/py3/xapian-bindings/python3

There you'll see my xapianlib.pxd and xapian.pyx files.

Where I'm seeing some odd behavior is in trying to expose the
Xapian::TermGenerator.get_description() method.  This returns a std::string
and I'm trying to create a `description` property that coerces this to unicode
before returning it to Python.  Here's the relevant code:

-----snip snip-----
cdef class TermGenerator:
    cdef xapianlib.TermGenerator * _this

    def __cinit__(self):
        self._this = new xapianlib.TermGenerator()

    def __dealloc__(self):
        del self._this

    property description:
        def __get__(self):
            as_bytes = <char *>self._this.get_description().c_str()
            #return as_bytes
            return as_bytes.decode('utf-8')
-----snip snip-----

I'm sure I'm doing something naive or stupid, but the problem is that
as written above, .description is returning nonsense.

% python
Python 3.2.3 (default, May  3 2012, 15:51:42) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xapian
>>> tg = xapian.TermGenerator()
>>> tg.description
'\x00\x00\x00\x00_\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

If instead, I return just the bytes object (i.e. what
.get_description().c_str() returns), then I get more like what I expect.

% python
Python 3.2.3 (default, May  3 2012, 15:51:42) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xapian
>>> tg = xapian.TermGenerator()
>>> tg.description
b'Xapian::TermGenerator(stem=Xapian::Stem(none), 
doc=Document(Xapian::Document::Internal()), termpos=0)'
>>> tg.description.decode('utf-8')
'Xapian::TermGenerator(stem=Xapian::Stem(none), 
doc=Document(Xapian::Document::Internal()), termpos=0)'

I looked at the generated code in the first example, but didn't really see
anything obvious.  There are no NULs in the char* description afaict.  I
haven't yet tested Cython 0.16 or 0.17 to see if this behaves differently.

Is this a bug or am I doing something stupid?

Cheers,
-Barry

Attachment: signature.asc
Description: PGP signature

_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to