Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]: > > There are a lot of users out there that are not willing to pay the > > price for increased generality. > > Don't you mean s/users/programmers? As a user I don't see what price I > pay. I only see advantages in having a consistent encoding. Which, > btw., doesn't have to be UTF-8. In an ideal world every programme would > adhere to LC_CTYPE. But if the encoding has to be configured then I > would also prefer UTF-8 as the default. > > Of course, for the programmer there might be a price to pay. And if > he's not willing to pay it, he can't be forced, anyway. > > Or do you mean the user pays the price, because if the encoding is set > to UTF-8 then performance would suffer? In that case, I'd love to see > some real life numbers. I doubt the difference would be noticeable.
Yes, performance will suffer. We enjoyed many decades of blissfully ignoring the difference between a character and a byte. So, while length(str) in any language up to the 1990s was a mere substraction, now we must go through the string checking each byte to see if it is a Unicode marker and substract the appropriate number of bytes. Also, for a very long time we didn't really care much what was a buffer's content - Everything could be printed, even if it had control characters which made you beep (with the ocassional control sequence re-injecting output into the terminal as input). Now... Well, printing an unprintable string can cause segfaults in some cases. -- Gunnar Wolf • gw...@gwolf.org • (+52-55)5623-0154 / 1451-2244 -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org