On Thu, Mar 19, 2009 at 6:40 AM, Ralf W. Grosse-Kunstleve <r...@yahoo.com> wrote: > > I tried the code below with Python 2.x. For a given str or unicode object, it > copies the > bytes in memory (char*) to a list of 1-character strings. I'm getting > > "hello" = ['h', 'e', 'l', 'l', 'o'] > u"hello" = ['h', '\x00', 'e', '\x00', 'l', '\x00', 'l', '\x00', 'o', '\x00'] > U"hello" = ['h', '\x00', 'e', '\x00', 'l', '\x00', 'l', '\x00', 'o', '\x00'] > > on platforms with sizeof(PY_UNICODE_TYPE) = 2 and > > "hello" = ['h', 'e', 'l', 'l', 'o'] > u"hello" = ['h', '\x00', '\x00', '\x00', 'e', '\x00', '\x00', '\x00', 'l', > '\x00', '\x00', '\x00', 'l', '\x00', '\x00', '\x00', 'o', '\x00', '\x00', > '\x00'] > U"hello" = ['h', '\x00', '\x00', '\x00', 'e', '\x00', '\x00', '\x00', 'l', > '\x00', '\x00', '\x00', 'l', '\x00', '\x00', '\x00', 'o', '\x00', '\x00', > '\x00'] > > on platforms with sizeof(PY_UNICODE_TYPE) = 4. > > Will the results be different using Python 3?
The result in Python 3 will be: "hello" = ['h', '\x00', 'e', '\x00', 'l', '\x00', 'l', '\x00', 'o', '\x00'] b"hello" = ['h', 'e', 'l', 'l', 'o'] and u"hello" is invalid by then. > > I have quite a few C++ functions with const char* arguments, expecting one > byte per character. > >> - convert char* and std::string to/from Python 3 unicode string. > > How would this work exactly? > Is the plan to copy the unicode data to a temporary one-byte-per-character > buffer? > Of course the default converter policy we planed is not to convert to raw data buffer from unicode object via PyUnicode_AS_DATA(). The C-API such as PyUnicode_AsUTF8String() and PyUnicode_AsEncodedString() will be used to convert unicode to bytes and then convert to char* and passed to your C++ function. By default we would use PyUnicode_AsUTF8String(), and encoding could be explicitly specified by a converter policy. That may keep most of your code compatible. Thanks! -- Haoyu Bai _______________________________________________ Cplusplus-sig mailing list Cplusplus-sig@python.org http://mail.python.org/mailman/listinfo/cplusplus-sig