On Apr 8, 2005, at 6:54 PM, John Blumel wrote:
On Apr 8, 2005, at 7:35pm, Robert Barta wrote:
On Wed, Apr 06, 2005 at 07:46:24PM -0400, John Blumel wrote:
I've written a bot, using LWP, to upload articles extracted from the
glossary and it works fine for those that don't contain accented
characters. Unfortunately, articles containing accented characters
have
those characters corrupted when they are uploaded.
Without looking at your code, this could be a number of things, some
of them, that you never tell the server the encoding you are using
when uploading, or the server messing up things when storing the
content. Or the server messing up things when it offers the content.
Thanks for your response. I finally solved the problem, although I
still don't understand why it works this way.
I stumbled across a fix while I was in the midst of trying out various
encoding options. I was trying the file in UTF-8 "one last time" and,
after removing some encoding statements from my bot's source file,
forgot to save the last changes before running it. (I had saved some
earlier changes.) As it turned out it worked, although, I don't really
understand why -- it could be a weird MediWiki or Mac OS X quirk --
but here's what did work.
The input files have to be saved in UTF-8 -- not a problem since
TextEdit (I'm on Mac OS X 10.3) can save in any of the many encodings
supported by the system. Then the files must be read in with just a
normal open() with no special encoding parameter. Then the strange
part. Once read in, I must encode the file contents as 'latin1' before
submitting the article (and obviously, I have to do this to the title
*before* escaping it). If I don't do the latin1 encoding, it doesn't
work, which I don't understand, since I'm submitting to a UTF-8 server
application, and which might mean I've got something else not quite
right.
Someone else suggested that they had eliminated the message
"Parsing of undecoded UTF-8 will give garbage when decoding
entities at /Library/Perl/5.8.1/LWP/Protocol.pm line 114."
by upgrading to a newer version of Perl (I'm at 5.8.1). I'm not too
worried about this at the moment (it's "unrelated" to the submission
since it occurs when retrieving an edit page to get a "token" and the
program keeps going) although, I may look into a Perl upgrade for Mac
OS X at my earliest opportunity.
Wow. This sure sounds like alchemy. Are you sure this is programming?
peajoe.