Re: [Cython] String types with Python 2.x and 3.x

Dominic Sacré Sat, 12 Sep 2009 09:43:29 -0700

On Saturday 12 of September 2009 08:27:20 Stefan Behnel wrote:
> > However, 'bytes' is not really useful in a context where an actual
> > string is expected
>
> You mean "text", I suppose? "string" is ambiguous as it can refer to
> C strings, Python byte strings and Python Unicode strings.


Yes, that's what I meant.

> > The only solution I've found to at least get most of my code
> > working is basically to use unicode for almost everything
>
> That's the way to go anyway. To make the code Unicode aware, you have
> to make it distinguish between text, encoded text and data.

Well, actually my module only needs to be able to handle ASCII, because 
the protocol it implements doesn't support anything else.
So it seems weird and in many cases very cumbersome use unicode 
internally, especially with Py2, where usually all string coming from 
Python will not be unicode in the first place.

> The main theme is to decide if you want to work with unicode
> internally or with encoded byte strings.

Using byte strings internally seems to make much more sense to me in 
this case. In fact that was my first attempt, though not deliberately, 
but simply because that's what happened when I tried to use my 
unmodified code with Py3.
I think I'll try to go back to that approach again, and insert 
encoding/decoding wherever necessary to make sure that no unicode 
strings get in, and no byte strings get out...

By the way, another issue I've stumbled upon:
With Py3, str(42) does not work as one would expect, because it actually 
creates a bytes object of length 42, filled with zeroes. Should this be 
considered a bug, or is it just one of the awkward consequences of 'str' 
meaning 'bytes' with Py3?


Thanks,

Dominic

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] String types with Python 2.x and 3.x

Reply via email to