I thought I'd understood how to use unicode support in perl, but
evidently not. In the script below, I'm stumped as to:
1) why the regex won't match '月'.
2) why the substitution is carried out, but the result isn't in UTF8,
nor is it UTF8 re-encoded in UTF8 (uncomment #require Encode;
........... #Encode::decode_utf8($_); to test this )
TIA
Robin
#!/usr/bin/perl -w
use strict;
use diagnostics-verbose;
#require Encode;
binmode (DATA,":utf8");
binmode (STDOUT,":utf8");
for (<DATA>){
if (/(<[EMAIL PROTECTED]>)/gs){
print "match: ",$1,"\n";
#Encode::decode_utf8($_);
s/$1/日本の/gs;
}elsif(/(月)/gs){
print "match: ",$1,"\n";
s/$1/12月/gs;
}
print;
}
__DATA__
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
<TITLE> A Web Page</TITLE>
</HEAD>
<BODY>
<BLOCKQUOTE>
<H3>日本語のnews<FONT COLOR=#FF3300>1月</FONT></H3>
... and this is a web page.
<P>
<IMG ALT="A Filler" WIDTH="450" HEIGHT="296">
<P>
hidden marker here -----><FONT
COLOR=#FF3300><[EMAIL PROTECTED]></FONT><------<BR>
</BLOCKQUOTE>
</BODY>
</HTML>