Re: [Cython] Py3K: recent rename PyString -> PyBytes

Stefan Behnel Tue, 03 Jun 2008 22:26:53 -0700

Hi,

Robert Bradshaw wrote:
> Stefan Behnel wrote:
>>> On this note, what happens when a cdef variable (like above) has non- 
>>> ASCII characters in it?
>> 
>> Do you mean in its name? That can't currently happen as the scanner only
>> allows pure ASCII alphanumeric identifiers.
>
> Yep, that's what I meant (and, in fact, that's what the code above is
> using).


I know, I just wanted to mention it to be sure you're aware of it. If you say
not supporting PEP 3131 is a bug, then even this may stop working one day.

I was actually considering opening C source output files with a plain ASCII
codec, just to make sure we get an error if we accidentally output any
non-ASCII characters into the C source. I couldn't do that yet, one reason
being that we currently write names as unicode strings but string literals as
encoded byte strings. That will have to be fixed when migrating Cython itself
to Py3, which is strict about the type of output streams (text vs. byte 
streams).

I consider encoding at the very end to be the right thing to do, even more so
the more unicode support we enable in various places. Maybe the CCodeWriter
should get dedicated methods for writing byte string literals, unicode string
literals and literal identifier names, so that we end up with a single place
to handle output encodings. You'd then pass in the source code input encoding
on creation and it would just do the right thing, depending on the method
through which the code content came in.

Or, instead of doing the string formatting outside of the call, we could pass
all values in as "*args" and do the formatting in the code writer, where we
could then convert EncodedString instances amongst the arguments, for example.

I'm not sure yet what's the right way of handling this...


>> Keyword arguments are a different thing, but allowing non-ASCII keywords in
>> function signatures will require us to write our own ParseTupleAndKeywords()
>> to keep up compatibility with Py2
>
> We wouldn't have to back-port this to Py2, it would be an error in
> this case (maybe at C compile time there would an error raised if non-
> ascii identifiers are used).

That's a good point. In Py2, keyword arguments *must* be byte strings.
Although ASCII is not enforced, you can only pass non-ASCII keywords using
"**dict" and accept them using "**kwargs", so it would be ok IMHO if we just
generated an "#if Py2 #error" directive when we find non-ASCII keywords in the
signature.

Stefan


_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Py3K: recent rename PyString -> PyBytes

Reply via email to