On Feb 24, 2013, at 06:59 , Marvin Humphrey <[email protected]> wrote:
> I'd be interested whether you have anything to add to this argument from Tom > Christiansen as to why iteration is the best model for string processing, as > opposed to random access: > > http://bugs.python.org/issue12729#msg142036 I fully agree with Tom. Random access is only useful if you deal with fixed-length records which are rarely used these days. This is a very interesting thread, BTW. It taught me some things I didn't know about Unicode yet. Thanks for sharing it. Another nice thing about iterators is that if we have to support multiple encodings, the encoding can be abstracted behind the iterator interface. So we can share the implementations of String methods across encodings except for performance-critical stuff like Hash_Sum. > Christiansen also argues for UTF-8 as a native encoding, like Perl and Go. > Clownfish doesn't have that option -- but if we make iteration our primary > string processing model, we can avoid problems associated with random > access, such as splitting logical characters. UTF-8 is certainly superior in almost all aspects. The fact that UTF-16 is still used so much has mainly historical reasons. Many implementations originally started out with UCS-2 and later upgraded to UTF-16 being the obvious but not really ideal choice. Switching from a fixed-width to a variable-width encoding has a lot of implications which have been overlooked in some programming languages as Tom Christiansen points out in the thread mentioned above. Nick
