Re: [Cython] Unicode issues

Stefan Behnel Tue, 20 May 2008 01:37:18 -0700

Hi Greg,

Stefan Behnel wrote:
> Greg Ewing wrote:
>> One of the arguments being used against automatic
>> unicode->char * using ascii as the encoding seems
>> to be that it can cause your module to fail at run
>> time.
>>
>> But how is this different from using an explicit
>> encoding operation? It can *still* fail at run time
>> if the unicode string passed can't be represented
>> in the chosen encoding.
>
> you can't be sure that you are actually looking at an ASCII-compatible
> string (i.e. ISO or UTF-8 encoded)


Sorry, you were actually talking about the unicode->char* case, in which
case it can easily be checked that only ASCII characters are used.

    c_ptr = PyString_AsString(PyUnicode_AsASCIIString(s))

would do the right thing. The opposite case would be this, then?

    s = PyUnicode_DecodeASCII(c_ptr, strlen(c_ptr), NULL)

How would you deal with null bytes in the string? (Although I guess that's
not a valid use case anyway).

But there is still the argument that Py3 no longer does this for
unicode->bytes coercion...

And this:

> and b) this makes it very easy to write
> buggy code that works perfectly until someone passes non-ASCII characters.

isn't really helped either.

> I find it helpful to prevent writing such code right from the beginning,
> rather than requiring manual fixing when the problem comes up. I think
> that was one of the main reasons why the types were separated for Py3.

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Unicode issues

Reply via email to