At 9:48 am +1100 24/11/05, [EMAIL PROTECTED] wrote:
I came across characters with incorrect encodings in a web-page I
was trying to validate. [Incorrect, that is, in context. I'm sure
they were fine in MS Word or wherever they originated.]
So, one character is a smart, single quote:
'
and when I identify it using BBEdit's ASCII table, it says it's
"code" is 8217 and its "escape" is %2019.
What would be the representation of this character in Perl's
\-syntax? I tried \x8217 and \x2019, but they didn't match in
regexes. I'm obviously a bit out of my depth in terms of what that
character is, to Perl. Do I need to "use utf8"? That didn't seem to
work either.
If you set the encoding of your BBEdit document to UTF-8 (no bom)
then you can simply type the curly quotes and they will be written to
the doc as UTF-8. Without mentioning legacy character sets, below
are 4 more ways of writing the quotes. You must declare the charset
in the html.
#!/usr/bin/perl
no warnings;
$examples = <<"EOE";
1. “double” ‘single’
2. “double” ‘single’
3. “double” ‘single’
4. \x{201c}double\x{201d} \x{2018}single\x{2019}
EOE
$f = "/tmp/quotes.html";
$_ = <<"EOT";
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Quotes</title>
</head>
<body>
<pre><font face="Georgia">
xxx
</pre>
</body>
</html>
EOT
s/xxx/$examples/;
open F, ">$f";
print F;
close F;
`open -a Safari $f`;