On Saturday 12 of September 2009 08:27:20 Stefan Behnel wrote: > > However, 'bytes' is not really useful in a context where an actual > > string is expected > > You mean "text", I suppose? "string" is ambiguous as it can refer to > C strings, Python byte strings and Python Unicode strings.
Yes, that's what I meant. > > The only solution I've found to at least get most of my code > > working is basically to use unicode for almost everything > > That's the way to go anyway. To make the code Unicode aware, you have > to make it distinguish between text, encoded text and data. Well, actually my module only needs to be able to handle ASCII, because the protocol it implements doesn't support anything else. So it seems weird and in many cases very cumbersome use unicode internally, especially with Py2, where usually all string coming from Python will not be unicode in the first place. > The main theme is to decide if you want to work with unicode > internally or with encoded byte strings. Using byte strings internally seems to make much more sense to me in this case. In fact that was my first attempt, though not deliberately, but simply because that's what happened when I tried to use my unmodified code with Py3. I think I'll try to go back to that approach again, and insert encoding/decoding wherever necessary to make sure that no unicode strings get in, and no byte strings get out... By the way, another issue I've stumbled upon: With Py3, str(42) does not work as one would expect, because it actually creates a bytes object of length 42, filled with zeroes. Should this be considered a bug, or is it just one of the awkward consequences of 'str' meaning 'bytes' with Py3? Thanks, Dominic _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
