Hi Greg,
Stefan Behnel wrote:
> Greg Ewing wrote:
>> One of the arguments being used against automatic
>> unicode->char * using ascii as the encoding seems
>> to be that it can cause your module to fail at run
>> time.
>>
>> But how is this different from using an explicit
>> encoding operation? It can *still* fail at run time
>> if the unicode string passed can't be represented
>> in the chosen encoding.
>
> you can't be sure that you are actually looking at an ASCII-compatible
> string (i.e. ISO or UTF-8 encoded)
Sorry, you were actually talking about the unicode->char* case, in which
case it can easily be checked that only ASCII characters are used.
c_ptr = PyString_AsString(PyUnicode_AsASCIIString(s))
would do the right thing. The opposite case would be this, then?
s = PyUnicode_DecodeASCII(c_ptr, strlen(c_ptr), NULL)
How would you deal with null bytes in the string? (Although I guess that's
not a valid use case anyway).
But there is still the argument that Py3 no longer does this for
unicode->bytes coercion...
And this:
> and b) this makes it very easy to write
> buggy code that works perfectly until someone passes non-ASCII characters.
isn't really helped either.
> I find it helpful to prevent writing such code right from the beginning,
> rather than requiring manual fixing when the problem comes up. I think
> that was one of the main reasons why the types were separated for Py3.
Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev