Re: [Cython] unicode can bite us ...

Robert Bradshaw Tue, 24 Feb 2009 14:36:15 -0800

On Feb 24, 2009, at 1:34 AM, Stefan Behnel wrote:

> Robert Bradshaw wrote:
>> On Feb 24, 2009, at 12:29 AM, Stefan Behnel wrote:
>>> Well, at least, that's what's written in the code: a byte string.
>>> What I'm
>>> saying is that /requiring/ a byte string at the interface level is
>>> wrong.
>>
>> I agree on this point. I'm not as convinced that accepting a byte
>> string is wrong though.
>
> It's not wrong to /accept/ one. But it's wrong to have your code  
> fail when
> someone passes a unicode string.


Yes, this is what I was agreeing to.

> I normally vote for requiring unicode strings in APIs under Py3, but
> that's something that needs to be decided for each case separately.
>
>
>>>> I think you underestimate how long broken libraries will be out
>>>> there.
>>>
>>> Let's wait and see. It didn't take me very long to fix up the Py3
>>> unicode
>>> problems of lxml's API (those that were independent of Cython), so
>>> I would
>>> expect that any library can be fixed in a couple of weeks
>>
>> I don't doubt most libraries could be made Py3 unicode compliant if
>> someone were willing to spend "a couple of weeks" fixing it
>
> I was actually speaking in terms of spare-time weeks rather than  
> full-time
> weeks.
>
> It's all about fixing APIs. What I'd advocate (for Cython code at  
> least)
> is to pass all API string input through a helper function that does  
> the
> right thing, and to do the same for string output. That gives you a  
> single
> place for fixing things, and it only needs to be done once. Then  
> there's
> evil things like file name handling, but that's about it.
>
> Even ParseArgs() and friends help you by accepting unicode strings  
> for a
> char* ("s"), as long as the ASCII codec can decode them. Which is
> definitely the case for NumPy arguments, for example.


>> We don't need a new syntax.
>>
>> def foo():
>>      return "Something."
>>
>> should return a str object: bytes under Py2, and unicode under Py3.
>
> :) didn't we have this discussion already?

Yep, so I'll try not to belabor the point.

> What if you wanted to pass the result of that function into C code?

Personally, I would either (1) declare for once and for all that  
PyUnicode objects <-> char* conversion always happens via UTF-8 or  
(2) allow implicit conversion, raising an error for non 7-bit ascii  
either way.

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] unicode can bite us ...

Reply via email to