Robert Bradshaw, 15.12.2009 00:40: > I don't think users doing input validation are going to stop doing > input validation because of an easier str -> char* conversion option. > I'm also skeptical that having to manually do str -> byes -> char* > encourages input validation. Validation is good. Shunning user > friendliness to try to enforce validation is not (in my mind) so good.
The only case I really care about here are 0 bytes. Besides that case, 'bytes' and 'char*' are basically equivalent (or should be, at least), except for memory management, which is the main advantage of the bytes type. >> I'm not sure simply saying >> >> def func(bytes s): >> ... >> >> plus a global setting somewhere at the top of your code is really >> readable >> enough as "this function accepts unicode strings which get converted >> automatically". And, no, I don't think typing the input parameter as >> "str" >> is what people want in most cases. I'm really leaning towards the >> assumption that most people really *want* bytes as basic string >> input type >> in their Cython code. Either that, or exactly unicode strings. Not >> 'str'. > > I agree with you for Py3, but Py2 is an important target, arguably > more important than Py3 at this point in time (until numpy and the > rest of the scientific world moves over), and will be with us for at > least a while longer. In Py2, 'str' is 'bytes', and my statement certainly holds for Py2. Honestly, what would you want with an input data type that suddenly switches to something completely different when you compile your code in Py3? If you want encoded bytes input in Py2, you most likely want encoded bytes input in Py3 as well (see the Wiki page I started). And if you want unicode in Py2, you surely want unicode in Py3. > I think they're relatively orthogonal. Most of the discussion has been > about adding new types, new syntax, mutating objects from one type to > another, etc. and the semantics of doing all that are much less clear > than "if an encoding is needed, use this one rather than bailing..." If that's so clear, then please answer the following: when is an encoding needed? Is that only when coercing between char* and Python strings, or also when coercing between bytes/unicode? Will there be a different handling for function signatures, or will it work the same everywhere? I.e. will a "def func(bytes b)" function always accept unicode, and what is the way to disable that? Or will only "def func(char*)" accept unicode input? And will the latter still accept bytes input? Not so clear to me, at least, and certainly not obvious. Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
