On Sat, 23 Feb 2002, Daisuke Maki wrote: > > > Ah, that's a problem I've had before :) The absolutely totally latest AxKit > > CVS *might* fix that problem. If it doesn't, you might want to have a look at > > AxKit::XSP::CharConv: > > > > <xsp:expr> > > <char:charset-convert from='EUC'>$foo</char:charset-convert> > > </xsp:expr> > > > > That should take care of converting whatever your input is to UTF-8 (you can > > use it to convert to whatever, so long as your iconv supports it). > > I looked at the CVS, and I don't think this issue is addressed. And > quite frankly, I don't want to have to add more taglibs for every > "outside" variable that I need to use, so I have a proposition: > > Is it ok if we force the result of a <xsp:expr> to be converted to UTF, > no matter what? Attatched is the diff of this proposition.
I've been mulling over this for a few days now, and Robin said you deserved a reply so I'm replying ;-) I honestly think it's the wrong approach. The patch assumes all external data sources will be in the same encoding as the XSP source. This seems wrong to me. Taking the two examples I can think of - form params and DBI calls, both should theoretically (assuming you've setup your database correctly, and the modules you're using are working right) return data in UTF-8 [*]. I may be wrong on the form param issue, but it seems to me that something further down the line should be converting stuff to UTF-8 so that it plays nicely with Perl. Perl 5.8 will certainly be doing this automatically for things like file reading (where you can specify the encoding of the file when you open it, or later using binmode()). Secondly I think it's a bad idea because it means if I write an XSP page in Latin1, I can *never* include characters outside that range coming from an external data source. And that's definitely a bad thing. Anyway, I do generally think it's a bad idea, but I'm not the definitive word on this by any stretch. Plus if you patch XSP.pm you'll need to patch XSP::CharsetConv accordingly, so that it converts stuff to the document's character set. Which of course means that when you use CharsetConv, you'll be doing two translations - once to the docs charset, then again to UTF-8 for libxml to be able to understand. I hope you can see how this slows down everyone's XSP pages ;-) [*] When I say UTF-8 here, I really mean Perl's internal representation, which is supposed to be transparent to us and just be "unicode" or "characters", but in reality we know that's not realistic and that modules must turn stuff into UTF-8 to be processable by Perl. -- <!-- Matt --> <:->Get a smart net</:-> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
