From: "Chas. Owens" <[EMAIL PROTECTED]>

>> I believe the OP will need to identify all the characters he would >> like >> to see converted, and code the conversion rules himself using the >> tr///
>> or s/// operator.
>
> Yes I think that it might not be any standard transforming algorithm > for
> doing this, and the program that do that, do their own transform.
> So finally I've decided to try finding all the possible chars with
> tildes, acute or grave accents, umlauts, etc, and replace using tr//.
>
> I hope I won't have any issues, because the chars are UTF-8.

Well, then you'll probably need to identify the utf8 octet sequences
that correspond to the special characters you want to see transformed.
snip

Perl strings are in UTF-8*, but if you want to specify a character
without using it directly (so the Perl file can still be treated as
ASCII) you use the UNICODE representation instead:

my $a_with_macron = "\x{0101}"; #UTF-8 encoding is C4 81

So, knowing the UTF-8 sequences is fairly useless.


Ok, and if I want to use tr// to replace a set of UTF-8 chars, how can I do it?

Can I simply use
tr/astâîASTÂÎ/astaiASTAI/;

I am not sure I can because I've tried this, and something's not ok so I'll need to check tomorrow.

I have also seen that length($string) returns the number of bytes of $string, and not the number of chars (if the string contains UTF-8 chars).

How can I get the array of UTF-8 chars and the length of the string in chars?

I haven't used
use bytes;
and neither
use utf-8;

I've tried them both, but... no change.

Thanks.

Octavian


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to