Hi Barry,

Barry Warsaw, 06.07.2012 00:29:
> I'm currently exploring using Cython to provide new Python 3 bindings for
> Xapian.  I'm pretty much a Cython n00b but the documentation is great, and I
> was able to pretty quickly get something really simple working.  I'm using
> Cython 0.15 on Ubuntu 12.04 with Python 3.2 and Xapian 1.2.12.  I've pushed my
> current branch to github:
> 
> https://github.com/warsaw/xapian/tree/py3/xapian-bindings/python3
> 
> There you'll see my xapianlib.pxd and xapian.pyx files.
> 
> Where I'm seeing some odd behavior is in trying to expose the
> Xapian::TermGenerator.get_description() method.  This returns a std::string
> and I'm trying to create a `description` property that coerces this to unicode
> before returning it to Python.  Here's the relevant code:
> 
> -----snip snip-----
> cdef class TermGenerator:
>     cdef xapianlib.TermGenerator * _this
> 
>     def __cinit__(self):
>         self._this = new xapianlib.TermGenerator()
> 
>     def __dealloc__(self):
>         del self._this
> 
>     property description:
>         def __get__(self):
>             as_bytes = <char *>self._this.get_description().c_str()
>             #return as_bytes
>             return as_bytes.decode('utf-8')
> -----snip snip-----
> 
> I'm sure I'm doing something naive or stupid, but the problem is that
> as written above, .description is returning nonsense.
> 
> % python
> Python 3.2.3 (default, May  3 2012, 15:51:42) 
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import xapian
> >>> tg = xapian.TermGenerator()
> >>> tg.description
> '\x00\x00\x00\x00_\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
> 
> If instead, I return just the bytes object (i.e. what
> .get_description().c_str() returns), then I get more like what I expect.
> 
> % python
> Python 3.2.3 (default, May  3 2012, 15:51:42) 
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import xapian
> >>> tg = xapian.TermGenerator()
> >>> tg.description
> b'Xapian::TermGenerator(stem=Xapian::Stem(none), 
> doc=Document(Xapian::Document::Internal()), termpos=0)'
> >>> tg.description.decode('utf-8')
> 'Xapian::TermGenerator(stem=Xapian::Stem(none), 
> doc=Document(Xapian::Document::Internal()), termpos=0)'

This is very weird behaviour indeed. I wouldn't know why that should
happen. What "return as_bytes.decode('utf-8')" does is that is calls
strlen() to see how long the string is, then it calls the UTF-8 decode
C-API function with that.

The string that get_description() returns is allocated internally in the
C++ object, right? So it can't suddenly die or something?

One thing I would generally suggest is to do this:

    descr = self._this.get_description()
    return descr.data()[:descr.size()].decode('utf-8')

Avoids the call to strlen() by explicitly slicing the pointer. Also avoids
needing to make sure the C string is 0-terminated.


> I looked at the generated code in the first example, but didn't really see
> anything obvious.  There are no NULs in the char* description afaict.  I
> haven't yet tested Cython 0.16 or 0.17 to see if this behaves differently.

I wouldn't know any differences out of the top of my head, except that 0.17
has generally better support for STL containers and std:string (but that's
unrelated to this failure). I'm planning to enable direct support for
cpp_string.decode(...) as well, but that's not implemented yet. It would
basically generate the verbose code above automatically.


> Is this a bug or am I doing something stupid?

Definitely not doing something stupid, but I have no idea why this should
go wrong.

Stefan
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to