Re: [Cython] Idea for automatic encoding and decoding

Stefan Behnel Tue, 15 Dec 2009 00:07:11 -0800

Robert Bradshaw, 15.12.2009 00:40:
> I don't think users doing input validation are going to stop doing  
> input validation because of an easier str -> char* conversion option.  
> I'm also skeptical that having to manually do str -> byes -> char*  
> encourages input validation. Validation is good. Shunning user  
> friendliness to try to enforce validation is not (in my mind) so good.


The only case I really care about here are 0 bytes. Besides that case,
'bytes' and 'char*' are basically equivalent (or should be, at least),
except for memory management, which is the main advantage of the bytes type.


>> I'm not sure simply saying
>>
>>    def func(bytes s):
>>        ...
>>
>> plus a global setting somewhere at the top of your code is really  
>> readable
>> enough as "this function accepts unicode strings which get converted
>> automatically". And, no, I don't think typing the input parameter as  
>> "str"
>> is what people want in most cases. I'm really leaning towards the
>> assumption that most people really *want* bytes as basic string  
>> input type
>> in their Cython code. Either that, or exactly unicode strings. Not  
>> 'str'.
> 
> I agree with you for Py3, but Py2 is an important target, arguably  
> more important than Py3 at this point in time (until numpy and the  
> rest of the scientific world moves over), and will be with us for at  
> least a while longer.

In Py2, 'str' is 'bytes', and my statement certainly holds for Py2.
Honestly, what would you want with an input data type that suddenly
switches to something completely different when you compile your code in
Py3? If you want encoded bytes input in Py2, you most likely want encoded
bytes input in Py3 as well (see the Wiki page I started). And if you want
unicode in Py2, you surely want unicode in Py3.


> I think they're relatively orthogonal. Most of the discussion has been  
> about adding new types, new syntax, mutating objects from one type to  
> another, etc. and the semantics of doing all that are much less clear  
> than "if an encoding is needed, use this one rather than bailing..."

If that's so clear, then please answer the following: when is an encoding
needed? Is that only when coercing between char* and Python strings, or
also when coercing between bytes/unicode? Will there be a different
handling for function signatures, or will it work the same everywhere? I.e.
will a "def func(bytes b)" function always accept unicode, and what is the
way to disable that? Or will only "def func(char*)" accept unicode input?
And will the latter still accept bytes input?

Not so clear to me, at least, and certainly not obvious.

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Idea for automatic encoding and decoding

Reply via email to