On Apr 30, 2005, at 7:08pm, Sherm Pendley wrote:

How are $articleWorkText and $kWord being read into your app? Perl handles a variety of text encodings, but it does need to be told about the encoding to use.

If you're reading them from a file, you need to make certain to tell Perl that the file is UTF8 (or whatever) encoded. You can use Perl's three-argument open() for that:

open(FH, '<:utf8', '/path/to/file') or die;

$articleWorkText gets pulled out of $gArticle which is read in as,

open(HTML, "<:encoding(utf8)", "$gDirInput/$gFileName")
   or die "Can't open file $gDirInput/$gFileName for reading\n";
read(HTML, $gArticle, -s HTML);
close(HTML);

Which I think is more or less the same? Maybe not? $kWord is handled similarly, although, it's read in a line at a time, each line split into 1 or more keywords and each keyword put into an array.

Have a look at perluniintro and perlunicode if you haven't already.

See also the -C switch in perlrun - you can use that to specify that stdin and/or stdout should be regarded as UTF8, or make UTF8 the default encoding for all i/o streams.

OK, I'll take a look at that. It's starting to sound as though I may have a problem somewhere other than where I think it is.



John Blumel



Reply via email to