Hi, I’m trying to normalize a filehandle of unknown
encoding to UTF8. There is a lot of documentation about changing/converting
data formats but nothing I’ve tried works. Here is my problem and what I
tried to do to solve it. I have a form upload which is allowing my clients to upload address
books in different formats. Quite a few people are trying to upload LDIF files
exported from MS Outlook and often there are internationalized characters in
the windows-1252 character set. Here is an example of what I mean: Bjørn Stabel I have a file handle for the upload filed (it’s an
IO::File object) and I thought I could force the filehandle to convert itself
to UTF-8 ‘on the fly’ based on some of the examples and readings I’ve
done in the various PerlIO and encoding man pages. However nothing I do seems
to work. Here’s what I’ve tried: (Assume $fh) is the IO::File object binmode ($fh, “:utf8”) or die “trouble $!”; binmode ($fh, “:encoding(utf8)” ) or die “trouble
$!”; Now I can’t use ‘encoding(latin1)’ because
only some files are encoded this way. I run into trouble when I try to insert fields from the addressbook
into my UTF8 Postgresql database. Right now I can fix it with encode_utf8(…)
but I have to use that on every single recovered values that gets inserted into
the database, so it really seems like an ugly workaround. Isn’t there some way to normalize a filehandle of
unknown uncoding to UTF-8? All the examples I see seem to suggest this is
possible, but I just can’t make it work. Thank you for your suggestions, John Napiorkowski |