Hi Peter On Sat, 2008-09-06 at 20:49 -0500, Peter Karman wrote: > Ron Savage wrote on 9/5/08 7:51 PM: > > Hi Folks > > > > Here is the set up (details below): > > o An fcgid scripts calls... > > o A module based on CGI::Application::Dispatch, which calls... > > o My module, which reads country names from Postgres and displays them > > > > This works, so Ivory Coast is displayed as 'CÔte D'ivoire' (ignoring the > > upper-case O with caret for the moment). > > > > But when the first module above is installed as a mod_perl handler, > > and /that/ calls my module, the output is 'CÔte D'ivoire'. > > > > I find this scary, and would love an explanantion. > > > > Sounds like a typical encoding issue. The 'bad' display above is likely > because > you are sending utf8 encoded strings to the browser but claim that the charset > is latin1. > > IMO, the best route is all utf8, all the time. Store strings encoded as utf8 > in > your db, send utf8 to the browser, and encode/decode at your program > boundaries. > It's a real b*tch to track down the problem spots in a multiple-encoding set > up. > That's why I wrote Search::Tools::UTF8 to help me. If I'm having trouble, I > usually throw a to_utf8() function call at suspect strings and make sure I > declare utf8 as my charset in all my http headers and output.
Nice to know about Search::Tools::UTF8. Thanx. Using it, the valid output carps (as expected, since the -1 is documented): [Mon Sep 08 10:01:29 2008] [warn] mod_fcgid: stderr: byte -1 (R) is not Latin1 (it's 82 dec / 52 hex) at /home/ron/perl.modules/Local-Sites/lib/Local/Sites/Test/Sites.pm line 73 whereas the invalid output carps: byte 3 (�) is not Latin1 (it's 148 dec / 94 hex) at /home/ron/perl.modules/Local-Sites/lib/Local/Sites/Test/Sites.pm line 73 And in the log (valid, invalid): CGIApp: .............................. CGIApp: http://127.0.0.1/search/sites.fcgi CGIApp: CÔTE D'IVOIRE. Encoding: UTF8 off, ASCII, 3 characters 3 bytes CGIApp: is_flagged_utf8: CGIApp: is_perl_utf8_string: 0 CGIApp: is_sane_utf8: 1 CGIApp: find_bad_latin1_report: -1 CGIApp: .............................. CGIApp: http://127.0.0.1/test/sites CGIApp: CÔTE D'IVOIRE. Encoding: UTF8 off, ASCII, 3 characters 3 bytes CGIApp: is_flagged_utf8: CGIApp: is_perl_utf8_string: 1 CGIApp: is_sane_utf8: 0 CGIApp: find_bad_latin1_report: 3 so I'll abandon DBI -> data_string_desc($name). But I knew there was a problem! Your module nicely demonstrates that. Since the underlying module is the same in both cases, the question is why does one calling mechanism work and the other mangle the data? I'll dig into it :-((. -- Ron Savage [EMAIL PROTECTED] http://savage.net.au/index.html ##### CGI::Application community mailing list ################ ## ## ## To unsubscribe, or change your message delivery options, ## ## visit: http://www.erlbaum.net/mailman/listinfo/cgiapp ## ## ## ## Web archive: http://www.erlbaum.net/pipermail/cgiapp/ ## ## Wiki: http://cgiapp.erlbaum.net/ ## ## ## ################################################################
