Re: [Cython] Idea for automatic encoding and decoding

Greg Ewing Wed, 16 Dec 2009 15:04:03 -0800

Christopher Barker wrote:
> Robert Bradshaw wrote:
> 
>>Would
>>
>>def flump(utf8 s):
>>     return s
>>
>>return a bytes object?
> 
> I would expect it to return a unicode object -- in Python, I'd expect 
> bytes+encoding to be returned as a unicode object -- it's the only way 
> not to lose the encoding information.


I've been thinking something similar myself. Perhaps there
should be a rule that the encoded-bytes types are only for
"internal" use by Cython code, and whenever one gets coerced
to a generic Python object, it gets decoded into a unicode
string.

I think that would allow us to drop the C versions of the
encoded types altogether, and write things like

   cdef extern from "somewhere.h":
     char *cflump(char *)

   def utf8 flump(utf8 s):
     return cflump(s)

Advantages of this are that all the declarations are now
symmetrical and there is no need for any encoding
declarations on the C side.

A disadvantage is that it may not be obvious that flump()
actually returns a unicode string despite being declared
as returning utf8.

If you wanted it to actually return a bytes object,
you would have to write

   def bytes flump(utf8 s):
     return cflump(s)

>>Will there be a different
>>handling for function signatures, or will it work the same everywhere? I.e.
>>will a "def func(bytes b)" function always accept unicode,

Not under my version of the proposal -- there is only
automatic conversion between unicode and a bytes type
with a declared encoding. Unicode and plain bytes are
still incompatible.

-- 
Greg
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Idea for automatic encoding and decoding

Reply via email to