The way in which Perl supports Unicode, you normally should hardly ever have to call a UTF-8 encoder or decoder explicitely and manually. You just have to make sure that when a UTF-8 string enters Perl, it does so tagged as a UTF-8 string and not as an octet string. How that happens depends on how the string gets into Perl. When opening files, for instance, you can tell Perl the charset to expect or to look at the LC_CTYPE locale.
Perl Unicode support before 5.8.0 was experimental, incomplete and in practice not useable. Perl 5.8.0 worked pretty smoothly for me, I discovered in my own use only one single UTF-8-related bug to do with regular expressions, and that was fixed in 5.8.1. man perluniintro I had a lot of Perl 5.0 script that processed UTF-8 before there was any UTF-8 support in Perl. They continue to work with "use byte;" added, but they got significantly simpler by using Perls Unicode facilities. Question: What is a quick way in Perl to get a regular expression that matches all Unicode characters in the range U0100..U10FFFF, in other words all non-ASCII Unicode characters? Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
