Thanks for the follow up Stefan, On Jul 06, 2012, at 06:48 AM, Stefan Behnel wrote:
>This is very weird behaviour indeed. I wouldn't know why that should >happen. What "return as_bytes.decode('utf-8')" does is that is calls >strlen() to see how long the string is, then it calls the UTF-8 decode >C-API function with that. It seems like either the strlen() or the cast through char* is the problem. >The string that get_description() returns is allocated internally in the >C++ object, right? So it can't suddenly die or something? I don't think so. >One thing I would generally suggest is to do this: > > descr = self._this.get_description() > return descr.data()[:descr.size()].decode('utf-8') > >Avoids the call to strlen() by explicitly slicing the pointer. Also avoids >needing to make sure the C string is 0-terminated. According to http://www.cplusplus.com/reference/string/string/c_str/ The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object. I believe the const char* returned by c_str() is guaranteed to be null terminated. AFAICT, there are no embedded NULs. I also don't think there are any non-constant member function calls of the parent string object getting in the way. Next, I tried two different implementations: property description: def __get__(self): # works descr = self._this.get_description() return descr.c_str()[:descr.size()].decode('utf-8') property destruction: def __get__(self): # broken as_bytes = <char *>self._this.get_description().c_str() return as_bytes.decode('utf-8') The second case requires the cast or you get an error: xapian.cpp:1409:67: error: invalid conversion from ‘const char*’ to ‘char*’ [-fpermissive] but I don't think that's the problem. Looking at the generated C++ code, I see these two different implementations: works: __pyx_t_1 = ((PyObject *)PyUnicode_Decode(__pyx_v_descr.c_str(), __pyx_v_descr.size(), __pyx_k_1, NULL)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 84; __pyx_clineno = __LINE__; goto __pyx_L1_error;} broken: __pyx_t_1 = ((PyObject *)PyUnicode_Decode(__pyx_v_as_bytes, strlen(__pyx_v_as_bytes), __pyx_k_1, NULL)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 91; __pyx_clineno = __LINE__; goto __pyx_L1_error;} In the working case, __pyx_v_descr is a std::string, so the const char* returned by .c_str() is passed directly to PyUnicode_Decode() without a cast. The length is returned by std::string.size(). In the broken case, __pyx_v_as_bytes is a char* (I could not figure out how to preserve the const char* type) and strlen() is used to find the length. Those are the only substantive differences I could find. >I wouldn't know any differences out of the top of my head, except that 0.17 >has generally better support for STL containers and std:string (but that's >unrelated to this failure). I'm planning to enable direct support for >cpp_string.decode(...) as well, but that's not implemented yet. It would >basically generate the verbose code above automatically. > >> Is this a bug or am I doing something stupid? > >Definitely not doing something stupid, but I have no idea why this should >go wrong. Okay, at least I have a few workarounds :). I'd file a bug but I don't have permission to file new issues. If you have any other suggestions for ways to debug this, I'm happy to give them a try. Cheers, -Barry
signature.asc
Description: PGP signature
_______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel