Maris Nartiss wrote: > as you might already have noticed, there is a constant stream of > issues containing keywords "encoding" or more often > "UnicodeDecodeError". The main reason behind this is Python 2.x two > types of text strings - byte sequence (one you get with str()) and > Unicode (unicode()). Python 3.x will have only one - Unicode (byte > sequence is not a string any more) thus fixing this frustrating source > of errors.
Both versions have both types of string. In 2.x, str() and "plain" string literals create byte strings, while unicode() and u"..." create unicode strings. In 3.x, str() and plain string literals create unicode strings, while bytes() and b"..." create byte strings. The biggest differences between the two are: a) 2.x allows implicit conversions. If you pass a byte string where a unicode string is expected (or vice versa), the string is implicitly converted using the default encoding (which can't be set by a script). 3.x doesn't do this; you get an exception. b) 3.x tries quite hard to maintain the fiction that everything is unicode. E.g. sys.argv contains unicode strings, os.environ uses unicode strings for both keys and values, sys.stdin/stdout/stderr are text streams which return Unicode data. > Moving GRASS Python code to use Unicode internally will make it closer > to Python 3 ready and solve largest part of errors caused by implicit > conversation from encoded text strings to Unicode text strings. I don't particularly care what happens with wxGUI, and using unicode consistently would make sense there, as wx itself uses Unicode. But if you're planning on doing this to grass.script, I'm strongly opposed. It achieves nothing beyond making what should be wxGUI's problem into everyone else's problem. Pretending that everything is unicode only works so long as the rest of the world makes sure not to dispel the illusion. Otherwise, it fails hard. Something as simple as e.g. copying stdin to stdout fails just because the data isn't in the assumed encoding. Bear in mind that the C portion of GRASS (i.e. most of it) doesn't pay any attention to encodings unless it has to. It just passes bytes around. It doesn't care whether the bytes are in any particular encoding, and certainly won't attempt to ensure that data written to stdout or to files is in any particular encoding. -- Glynn Clements <[email protected]> _______________________________________________ grass-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-dev
