I thought I'd understood how to use unicode support in perl, but evidently not. In the script below, I'm stumped as to:

1) why the regex won't match '月'.
2) why the substitution is carried out, but the result isn't in UTF8, nor is it UTF8 re-encoded in UTF8 (uncomment #require Encode; ........... #Encode::decode_utf8($_); to test this )



 #!/usr/bin/perl -w

use strict;
use diagnostics-verbose;
#require Encode;

binmode (DATA,":utf8");

binmode (STDOUT,":utf8");

for (<DATA>){
        if (/(<[EMAIL PROTECTED]>)/gs){
        print "match: ",$1,"\n";
        print "match: ",$1,"\n";

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
        <TITLE> A Web Page</TITLE>  
<H3>日本語のnews<FONT COLOR=#FF3300>1月</FONT></H3>
... and this is a web page.
<IMG ALT="A Filler" WIDTH="450" HEIGHT="296">
hidden marker here -----><FONT COLOR=#FF3300><[EMAIL PROTECTED]></FONT><------<BR>

Reply via email to