I've written a bot, using LWP, to upload articles extracted from the glossary and it works fine for those that don't contain accented characters. Unfortunately, articles containing accented characters have those characters corrupted when they are uploaded. I've been able to deal with these characters when they end up in the URL as the page title in the wiki (by converting them to '%xx', although URI:Escape doesn't seem to work for this) but I can't seem to figure out how to get the article content with these characters up to the wiki without corruption.
OS = Mac OS X 10.3, perl version = 5.8.1, LWP = latest from CPAN
Here's my function to POST the articles:
sub SubmitArticle { # get params my ($refArticle) = @_;
# retrieve an Edit page for the new article and get the edit token my $url = $gWikiURL . $refArticle->{title} . $gActionEdit; my $response = $gBot->request(GET $url); my ($editToken) = ($response->content =~ m/.*value="(.*?)".*name="wpEditToken"/s);
# create & send the submission request $url = $gWikiURL . $refArticle->{title} . $gActionSubmit; $response = $gBot->request(POST $url, Content_Type => 'form-data', Content => [wpSave => "Save page", wpSection => "", wpEdittime => "", wpEditToken => $editToken, wpSummary => $gSubmissionComment, wpTextbox1 => $refArticle->{wikitext}]);
# return the outcome based on the response status return $response->is_error?$FALSE:$TRUE; }
The text causing the problems is in $refArticle->{wikitext} and I get the following message from perl when running the script:
"Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/5.8.1/LWP/Protocol.pm line 114."
But, of course, I don't know how to correct this problem.
Any help would be greatly appreciated, and, of course, I'd like to figure out the problem with URI::Escape not escaping these same characters in the URLs -- these are in,
$refArticle->{title}
also seen in the code above.
John Blumel