Hi, Robert Bradshaw wrote: > Stefan Behnel wrote: >>> On this note, what happens when a cdef variable (like above) has non- >>> ASCII characters in it? >> >> Do you mean in its name? That can't currently happen as the scanner only >> allows pure ASCII alphanumeric identifiers. > > Yep, that's what I meant (and, in fact, that's what the code above is > using).
I know, I just wanted to mention it to be sure you're aware of it. If you say not supporting PEP 3131 is a bug, then even this may stop working one day. I was actually considering opening C source output files with a plain ASCII codec, just to make sure we get an error if we accidentally output any non-ASCII characters into the C source. I couldn't do that yet, one reason being that we currently write names as unicode strings but string literals as encoded byte strings. That will have to be fixed when migrating Cython itself to Py3, which is strict about the type of output streams (text vs. byte streams). I consider encoding at the very end to be the right thing to do, even more so the more unicode support we enable in various places. Maybe the CCodeWriter should get dedicated methods for writing byte string literals, unicode string literals and literal identifier names, so that we end up with a single place to handle output encodings. You'd then pass in the source code input encoding on creation and it would just do the right thing, depending on the method through which the code content came in. Or, instead of doing the string formatting outside of the call, we could pass all values in as "*args" and do the formatting in the code writer, where we could then convert EncodedString instances amongst the arguments, for example. I'm not sure yet what's the right way of handling this... >> Keyword arguments are a different thing, but allowing non-ASCII keywords in >> function signatures will require us to write our own ParseTupleAndKeywords() >> to keep up compatibility with Py2 > > We wouldn't have to back-port this to Py2, it would be an error in > this case (maybe at C compile time there would an error raised if non- > ascii identifiers are used). That's a good point. In Py2, keyword arguments *must* be byte strings. Although ASCII is not enforced, you can only pass non-ASCII keywords using "**dict" and accept them using "**kwargs", so it would be ok IMHO if we just generated an "#if Py2 #error" directive when we find non-ASCII keywords in the signature. Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
