On Feb 15, 2006, at 7:19 AM, Fuzzyman wrote: > [snip..] > > I personally like the move towards all unicode strings, basically > any text where you don't know the encoding used is 'random binary > data'. This works fine, so long as you are in control of the text > source. *However*, it leaves the following problem : > > The current situation (treating byte-sequences as text and assuming > they are an ascii-superset encoded text-string) *works* (albeit > with many breakages), simply because this assumption is usually > correct. > > Forcing the programmer to be aware of encodings, also pushes the > same requirement onto the user (who is often the source of the text > in question). > > Currently you can read a text file and process it - making sure > that any changes/requirements only use ascii characters. It > therefore doesn't matter what 8 bit ascii-superset encoding is used > in the original. If you force the programmer to specify the > encoding in order to read the file, they would have to pass that > requirement onto their user. Their user is even less likely to be > encoding aware than the programmer.
Or the programmer can just use "iso-8859-1" and call it done. That will get you the same "I don't care" behavior as now. James _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com