> There is split and other functions, > for example: > > toupper("aí") > gives > Aí > > My guess is that there are many more little (or not) corners where it > doesn't work. > We can go on and on looking for crevices and hiding the bugs further > under the rug > so that they are not evident and find everyone completely unaware, > leave awk as it is now or really fix the problem. The first approach > doesn't work. I am going to take > the second till I have time to take the third which means use runes or > at least revise all the > code so that it is uniformly aware of the existance of non-ascii characters.
i don't understand this approach. you propose redoing a fundamental part of awk. yet at the end you won't have solved the bug that's bothering you. ignoring the fact that awk is an ape program and doesn't use runes, the problem with toupper is independent of the internal representation of strings. as far as i can tell, ape doesn't even have towupper and towlower. so if you provide those functions, fixing toupper and tolower could be a 5 minute fix. and you know you won't have broken anything else. /sys/doc/utf.ps is worth a read. it's not to hard to think of situations that depend on character boundaries or operate on non-ascii characters. generally there are few. for example, rc only bothers with character boundaries in matching. perhaps you could build a utf testsuite for awk. make sure to use non-latin1 languages, too. - erik