On Apr 30, 2005, at 7:08pm, Sherm Pendley wrote:
How are $articleWorkText and $kWord being read into your app? Perl handles a variety of text encodings, but it does need to be told about the encoding to use.
If you're reading them from a file, you need to make certain to tell Perl that the file is UTF8 (or whatever) encoded. You can use Perl's three-argument open() for that:
open(FH, '<:utf8', '/path/to/file') or die;
$articleWorkText gets pulled out of $gArticle which is read in as,
open(HTML, "<:encoding(utf8)", "$gDirInput/$gFileName") or die "Can't open file $gDirInput/$gFileName for reading\n"; read(HTML, $gArticle, -s HTML); close(HTML);
Which I think is more or less the same? Maybe not? $kWord is handled similarly, although, it's read in a line at a time, each line split into 1 or more keywords and each keyword put into an array.
Have a look at perluniintro and perlunicode if you haven't already.
See also the -C switch in perlrun - you can use that to specify that stdin and/or stdout should be regarded as UTF8, or make UTF8 the default encoding for all i/o streams.
OK, I'll take a look at that. It's starting to sound as though I may have a problem somewhere other than where I think it is.
John Blumel