Hi,

 

I’m trying to normalize a filehandle of unknown encoding to UTF8.  There is a lot of documentation about changing/converting data formats but nothing I’ve tried works.  Here is my problem and what I tried to do to solve it.

 

I have a form upload which is allowing my clients to upload address books in different formats.  Quite a few people are trying to upload LDIF files exported from MS Outlook and often there are internationalized characters in the windows-1252 character set.  Here is an example of what I mean:

 

Bjørn Stabel

 

I have a file handle for the upload filed (it’s an IO::File object) and I thought I could force the filehandle to convert itself to UTF-8 ‘on the fly’ based on some of the examples and readings I’ve done in the various PerlIO and encoding man pages.  However nothing I do seems to work.  Here’s what I’ve tried:

 

(Assume $fh) is the IO::File object

 

binmode ($fh, “:utf8”)  or die “trouble $!”;

binmode ($fh, “:encoding(utf8)” ) or die “trouble $!”;

 

Now I can’t use ‘encoding(latin1)’ because only some files are encoded this way.

 

I run into trouble when I try to insert fields from the addressbook into my UTF8 Postgresql database.  Right now I can fix it with encode_utf8(…) but I have to use that on every single recovered values that gets inserted into the database, so it really seems like an ugly workaround.

 

Isn’t there some way to normalize a filehandle of unknown uncoding to UTF-8?  All the examples I see seem to suggest this is possible, but I just can’t make it work.

 

Thank you for your suggestions,

 

John Napiorkowski

Reply via email to