At 18:16 +0000 2001-11-03, Markus Kuhn wrote: > >On Sat, 3 Nov 2001, Eli Zaretskii wrote: >> > ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO-4.html >> >> This is still silent about Grep, Sort, and tr, which are >> the utilities where the non-ASCII support should be a non-trivial >> change. >> >> Basically, even after reading that page (which told me something I >> didn't know in some cases), Unicode support in basic development >> tools is still very much rudimentary. > >In practice, Perl has long ago replaced grep, sort, tr, awk, for all but >sentimental reasons. Most of these little silly things were written as >inefficient separate C processes before 1975 for the sole reason that the >PDP-11 that Ritchie and Thompson used had only 64 kB RAM and couldn't >handle any larger multi-function tools: > >http://www.bell-labs.com/history/unix/ >http://www.bell-labs.com/history/unix/firstport.html > >Today, these tiny tools mostly lead people to write extremely inefficient >shell scripts that spend 90% of their time in fork(). > >UTF-8 support for Perl is in an advanced state, and for some more >experienced UTF-8 users, "grep", "sort", "tr", etc. are merely convenient >and nostalgic shell functions or scripts that call perl to do the job. > >[I sometimes wish, we could give up the classic Bourne-style shell with >it's baroque Algol-inspired syntax entirely and that perl had the few >facilities (e.g., prompts, readline-history, compact >command-invocation/argv/piping/redirecting notation, etc.) that are still missing before we can turn it into the main command-line shell.]
What a cheek calling "the classic Bourne-style shell" "baroque" when compared to Perl! Baroque originally meant "Bizarre" in French, and now it means the same as in English: irregular, grotesque, odd, singular, or pertaining to a style of music and architecture of the 17th and 18th centuries (also known as rococco) -- a style noted for excessive, extravagant ornamentation and embellishment. That fits Perl to a T, and it has nothing in common with the "classic shells" and the "tiny tools", which are clean, sparse, well crafted, streamlined, a designer's classic. Lets face it. Perl is a powerful 4GL, and that's why people use it: it is also better suited for CGI Scripts than its few contenders. Like many powerful things, it is ugly, inconsistent, quirky, and can be dangerous to the unwary. In particular, its main feature seems to be that it accomplishes almost everything as side effects to what its commands ostensibly do. Since it seems to have no consistency from command to command, and documentation on some of the side-effects is sketchy to say the least, it is a nightmare to tyros. Certainly it is difficult to debug compared to "the classic Bourne-style shell", and, like Ada, it is so huge with so many ways to do the same thing that you need to be using it every day just to exercise a quarter of its warty features. Also, unless there has been a LOT of tuning in the last two versions, there are some classes of problems which it doesn't go all that fast, either. Like David Starner, I stick to "these little silly things" unless I really can't do what I need to without terrible contortions, or having to write my own C programs: the other 0.05%, I use Perl. Making "grep", "sort", "tr", etc. UTF-8-native is not going to be a simple task, however, unless Unix/Linux/???BSD have full support, including built-in collation-sequence routines and a more elaborate locale structure than now seems to be supported. Perhaps Markus SHOULD have said "My interest is in getting Perl UTF-8-native because I use it, because there is a lot of interest in using it for CGI programming where being UTF-8-native is needed yesterday, and because it can do all the older routines can do in a pinch. Those who see a higher priority for the classic routines should pitch in and do them themselves." George -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
