Re: Character Encodings

John Delacour Thu, 24 Nov 2005 01:17:02 -0800

At 9:48 am +1100 24/11/05, [EMAIL PROTECTED] wrote:

I came across characters with incorrect encodings in a web-page Iwas trying to validate. [Incorrect, that is, in context. I'm surethey were fine in MS Word or wherever they originated.]
So, one character is a smart, single quote:

  '
and when I identify it using BBEdit's ASCII table, it says it's"code" is 8217 and its "escape" is %2019.
What would be the representation of this character in Perl's\-syntax? I tried \x8217 and \x2019, but they didn't match inregexes. I'm obviously a bit out of my depth in terms of what thatcharacter is, to Perl. Do I need to "use utf8"? That didn't seem towork either.

If you set the encoding of your BBEdit document to UTF-8 (no bom)then you can simply type the curly quotes and they will be written tothe doc as UTF-8. Without mentioning legacy character sets, beloware 4 more ways of writing the quotes. You must declare the charsetin the html.



#!/usr/bin/perl
no warnings;
$examples = <<"EOE";
1. &ldquo;double&rdquo; &lsquo;single&rsquo;
2. &#x201c;double&#x201d; &#x2018;single&#x2019;
3. &#8220;double&#8221; &#8216;single&#8217;
4. \x{201c}double\x{201d} \x{2018}single\x{2019}
EOE
$f = "/tmp/quotes.html";
$_ = <<"EOT";
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
<html>
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8">
  <title>Quotes</title>
</head>
<body>
<pre><font face="Georgia">
xxx
</pre>
</body>
</html>
EOT
s/xxx/$examples/;
open F, ">$f";
print F;
close F;
`open -a Safari $f`;

Re: Character Encodings

Reply via email to