Barry Warsaw, 06.07.2012 16:21: > Thanks for the follow up Stefan, > > On Jul 06, 2012, at 06:48 AM, Stefan Behnel wrote: > >> This is very weird behaviour indeed. I wouldn't know why that should >> happen. What "return as_bytes.decode('utf-8')" does is that is calls >> strlen() to see how long the string is, then it calls the UTF-8 decode >> C-API function with that. > > It seems like either the strlen() or the cast through char* is the problem.
Could you try it without the cast? https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/tutorial/strings.html#dealing-with-const In older Cython versions, you can use the declarations directly: https://github.com/cython/cython/blob/master/Cython/Includes/libc/string.pxd#L3 I just noticed that .c_str() is incorrectly declared (without "const") in Cython and that .data() is missing completely. I've pushed a fix for that (note that the current master looks a bit broken, which is rather unfortunate for testing). >> One thing I would generally suggest is to do this: >> >> descr = self._this.get_description() >> return descr.data()[:descr.size()].decode('utf-8') >> >> Avoids the call to strlen() by explicitly slicing the pointer. Also avoids >> needing to make sure the C string is 0-terminated. > > According to > > http://www.cplusplus.com/reference/string/string/c_str/ > > The returned array points to an internal location with the required > storage space for this sequence of characters plus its terminating > null-character, but the values in this array should not be modified in the > program and are only guaranteed to remain unchanged until the next call to > a non-constant member function of the string object. > > I believe the const char* returned by c_str() is guaranteed to be null > terminated. AFAICT, there are no embedded NULs. I also don't think there are > any non-constant member function calls of the parent string object getting in > the way. Yes to all of the above. What I meant was that .c_str() may be less efficient than .data() because the internal string buffer may not be 0-terminated originally. > Next, I tried two different implementations: > > property description: > def __get__(self): > # works > descr = self._this.get_description() > return descr.c_str()[:descr.size()].decode('utf-8') > > property destruction: > def __get__(self): > # broken > as_bytes = <char *>self._this.get_description().c_str() > return as_bytes.decode('utf-8') > > The second case requires the cast or you get an error: > > xapian.cpp:1409:67: error: invalid conversion from ‘const char*’ to ‘char*’ > [-fpermissive] > > but I don't think that's the problem. Looking at the generated C++ code, I > see these two different implementations: > > works: > > __pyx_t_1 = ((PyObject *)PyUnicode_Decode(__pyx_v_descr.c_str(), > __pyx_v_descr.size(), __pyx_k_1, NULL)); if (unlikely(!__pyx_t_1)) > {__pyx_filename = __pyx_f[0]; __pyx_lineno = 84; __pyx_clineno = __LINE__; > goto __pyx_L1_error;} > > broken: > > __pyx_t_1 = ((PyObject *)PyUnicode_Decode(__pyx_v_as_bytes, > strlen(__pyx_v_as_bytes), __pyx_k_1, NULL)); if (unlikely(!__pyx_t_1)) > {__pyx_filename = __pyx_f[0]; __pyx_lineno = 91; __pyx_clineno = __LINE__; > goto __pyx_L1_error;} > > In the working case, __pyx_v_descr is a std::string, so the const char* > returned by .c_str() is passed directly to PyUnicode_Decode() without a cast. > The length is returned by std::string.size(). > > In the broken case, __pyx_v_as_bytes is a char* (I could not figure out how to > preserve the const char* type) and strlen() is used to find the length. > > Those are the only substantive differences I could find. Maybe the C++ compiler is going mad because of the cast that kills "const"? >> I wouldn't know any differences out of the top of my head, except that 0.17 >> has generally better support for STL containers and std:string (but that's >> unrelated to this failure). I'm planning to enable direct support for >> cpp_string.decode(...) as well, but that's not implemented yet. It would >> basically generate the verbose code above automatically. >> >>> Is this a bug or am I doing something stupid? >> >> Definitely not doing something stupid, but I have no idea why this should >> go wrong. > > Okay, at least I have a few workarounds :). I'd file a bug but I don't have > permission to file new issues. Please send a htpasswd entry to me or Robert. > If you have any other suggestions for ways to debug this, I'm happy to give > them a try. Could you try to reproduce this without needing the Xapian library? It would be good to have a (failing) test case. Stefan _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel