On Sat, 23 Feb 2002, Daisuke Maki wrote:

>
> > Ah, that's a problem I've had before :) The absolutely totally latest AxKit
> > CVS *might* fix that problem. If it doesn't, you might want to have a look at
> > AxKit::XSP::CharConv:
> >
> > <xsp:expr>
> >   <char:charset-convert from='EUC'>$foo</char:charset-convert>
> > </xsp:expr>
> >
> > That should take care of converting whatever your input is to UTF-8 (you can
> > use it to convert to whatever, so long as your iconv supports it).
>
> I looked at the CVS, and I don't think this issue is addressed. And
> quite frankly, I don't want to have to add more taglibs for every
> "outside" variable that I need to use, so I have a proposition:
>
> Is it ok if we force the result of a <xsp:expr> to be converted to UTF,
> no matter what? Attatched is the diff of this proposition.

I've been mulling over this for a few days now, and Robin said you
deserved a reply so I'm replying ;-)

I honestly think it's the wrong approach. The patch assumes all external
data sources will be in the same encoding as the XSP source. This seems
wrong to me. Taking the two examples I can think of - form params and DBI
calls, both should theoretically (assuming you've setup your database
correctly, and the modules you're using are working right) return data in
UTF-8 [*]. I may be wrong on the form param issue, but it seems to me that
something further down the line should be converting stuff to UTF-8 so
that it plays nicely with Perl. Perl 5.8 will certainly be doing this
automatically for things like file reading (where you can specify the
encoding of the file when you open it, or later using binmode()).

Secondly I think it's a bad idea because it means if I write an XSP page
in Latin1, I can *never* include characters outside that range coming from
an external data source. And that's definitely a bad thing.

Anyway, I do generally think it's a bad idea, but I'm not the definitive
word on this by any stretch. Plus if you patch XSP.pm you'll need to patch
XSP::CharsetConv accordingly, so that it converts stuff to the document's
character set. Which of course means that when you use CharsetConv, you'll
be doing two translations - once to the docs charset, then again to UTF-8
for libxml to be able to understand. I hope you can see how this slows
down everyone's XSP pages ;-)

[*] When I say UTF-8 here, I really mean Perl's internal representation,
which is supposed to be transparent to us and just be "unicode" or
"characters", but in reality we know that's not realistic and that modules
must turn stuff into UTF-8 to be processable by Perl.

-- 
<!-- Matt -->
<:->Get a smart net</:->


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to