I thought I'd understood how to use unicode support in perl, but evidently not. In the script below, I'm stumped as to:

1) why the regex won't match '月'.
2) why the substitution is carried out, but the result isn't in UTF8, nor is it UTF8 re-encoded in UTF8 (uncomment #require Encode; ........... #Encode::decode_utf8($_); to test this )



TIA


Robin



 #!/usr/bin/perl -w

use strict;
use diagnostics-verbose;
#require Encode;


binmode (DATA,":utf8");


binmode (STDOUT,":utf8");


for (<DATA>){
        
        if (/(<[EMAIL PROTECTED]>)/gs){
        print "match: ",$1,"\n";
        #Encode::decode_utf8($_);
        s/$1/日本の/gs;
        
        }elsif(/(月)/gs){
        print "match: ",$1,"\n";
        s/$1/12月/gs;
        
        
        }
        
        print;
        
}       
        



__DATA__
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd";>
<HTML>
<HEAD>
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
        <TITLE> A Web Page</TITLE>  
</HEAD>
<BODY>
<BLOCKQUOTE>
<H3>日本語のnews<FONT COLOR=#FF3300>1月</FONT></H3>
... and this is a web page.
<P>
<IMG ALT="A Filler" WIDTH="450" HEIGHT="296">
<P>
hidden marker here -----><FONT COLOR=#FF3300><[EMAIL PROTECTED]></FONT><------<BR>
</BLOCKQUOTE>
</BODY>
</HTML>


Reply via email to