Dag Sverre Seljebotn, 15.05.2010 11:28: > Stefan Behnel wrote: >> latest cython-devel can infer the type of a for-loop variable when >> iterating over C arrays, C pointers and Python strings. It will infer >> Py_UNICODE for unicode strings, but plain 'object' for a bytes string, as >> this returns sliced strings in Py2 and integers in Py3, so there is no >> common C type. So the following will infer c to be a plain Python object: >> >> cdef bytes s = b'abcdefg' >> >> c = s[4] >> for c in s: >> pass >> >> However, this: >> >> c = b'abcdefg'[4] >> for c in b'abcdefg': >> pass >> >> will infer 'char' for c, as the bytes literal starts off as a char* string. >> The main problem here is that 'char' does not behave like a Python bytes >> object at all. I doubt that iterating over bytes literals is a common use >> case, but I'm not sure about the 'least surprising' thing to do here. >> >> Should we special case this to prevent breaking Python-2 semantics, or >> should we expect that users will usually want 'char' as a result anyway? >> >> Both behaviours are easy to get with a simple cast, so this is really only >> a matter of consistency and least surprise. The thing that really bites me >> here is that the bytes type in Py3 *does* return integers on iteration. So >> returning 'char' on indexing and iteration would be both more efficient and >> more future proof. But it would also be impossible to keep consistent in >> Python-2, as faking it would mean that an untyped bytes object would return >> a substring, whereas a typed one would return an integer. And I don't >> really want to inject a type check branch into each getitem call to >> override that behaviour... >> >> So ISTM that the only way to make this consistent is to follow Python 2 for >> now, including literals, and to accept the different (but also consistent) >> behaviour when running in Python 3. >> > "In the face of ambiguity, refuse the temptation to guess"?
I'm not sure this applies here. We have existing Python semantics for this, after all. They just differ between Python 2 and Python 3. This is just a case where we can't easily guarantee one specific behaviour as we do not control the type's implementation. > I.e., I'd just disallow it from the language (that is, require a cast), > because of this issue. I don't see iterating over string literals as > important enough that one can't require a cast. Indexing based on a dynamically calculated index may be somewhat more important though, and a cast makes that a lot more ugly. Also, requiring a cast would prevent us from compiling Python code that uses this (I guess that makes a case for following the Py2 semantics). Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
