Sat, 12 Jun 2010 17:33:13 -0700, Dan Roberts wrote: [clip: refactoring PyArray_Scalar] > There are a few problems with this. The biggest problem for me is > that it appears PyUCS2Buffer_FromUCS4() doesn't produce UCS2 at all, but > rather UTF-16 since it produces surrogate pairs for code points above > 0xFFFF. My first question is: is there any time when the data produced > by PyUCS2Buffer_FromUCS4() wouldn't be parseable by a standards > compliant UTF-16 decoder?
Since UTF-16 = UCS-2 + surrogate pairs, as far as I know, the data produced should always be parseable by DecodeUTF16. Conversion to real UCS-2 from UCS-4 would be a lossy procedure, since not all code points can be represented with 2 bytes. -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion