At 9:48 am +1100 24/11/05, [EMAIL PROTECTED] wrote:

I came across characters with incorrect encodings in a web-page I was trying to validate. [Incorrect, that is, in context. I'm sure they were fine in MS Word or wherever they originated.]

So, one character is a smart, single quote:

  '

and when I identify it using BBEdit's ASCII table, it says it's "code" is 8217 and its "escape" is %2019.

What would be the representation of this character in Perl's \-syntax? I tried \x8217 and \x2019, but they didn't match in regexes. I'm obviously a bit out of my depth in terms of what that character is, to Perl. Do I need to "use utf8"? That didn't seem to work either.


If you set the encoding of your BBEdit document to UTF-8 (no bom) then you can simply type the curly quotes and they will be written to the doc as UTF-8. Without mentioning legacy character sets, below are 4 more ways of writing the quotes. You must declare the charset in the html.


#!/usr/bin/perl
no warnings;
$examples = <<"EOE";
1. &ldquo;double&rdquo; &lsquo;single&rsquo;
2. &#x201c;double&#x201d; &#x2018;single&#x2019;
3. &#8220;double&#8221; &#8216;single&#8217;
4. \x{201c}double\x{201d} \x{2018}single\x{2019}
EOE
$f = "/tmp/quotes.html";
$_ = <<"EOT";
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"
<html>
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8">
  <title>Quotes</title>
</head>
<body>
<pre><font face="Georgia">
xxx
</pre>
</body>
</html>
EOT
s/xxx/$examples/;
open F, ">$f";
print F;
close F;
`open -a Safari $f`;

Reply via email to