On Fri, Sep 01, 2000 at 10:44:10PM -0400, Greg Stark wrote:
> 
> > >> can someone suggest me the best way to build a multilanguage web site
> > >> (english, french, ..).
> > >> I'm using Apache + mod_perl + Apache::asp (for applications)
> 
> I'm really interested in what other people are doing here. We've just released
> our first cut at i18n and it's going fairly well. But so far we haven't dealt
> with the big bugaboo, character encoding. 

> One major problem I anticipate is what to do when individual include files are
> not available in the local language. For iso-8859-1 encoded languages that's
> not a major hurdle as we can simply use the english text until it's
> translated. But for other encodings does it make sense to include english
> text? 

> If we use UTF-8 all the ascii characters would display properly, but do most
> browsers support UTF-8 now? Or do people still use BIG5, EUS, etc? 

> As far as I can tell there's no way in html to indicate to the browser that a
> chunk of content is in some other encoding other than what was specified in
> the headers or meta tag. There's no <span charset=...> attribute or anything
> like that. This seems to make truly multilingual pages really awkward. You
> basically must use an encoding like UTF-8 which can reach the entire unicode
> character set or else you cannot mix languages.

It's a mess, but you're just going to have to assume multiple
character sets for the forseeable future.  We try to use all utf8 data
sources.  XML defaults to this. Oracle can be easily set up this way,
and you can use utf8 in your html sources too.  You just have to be
careful, for example in our message catalogs we source translations
into utf8.

Anyway, here's what's in my global.asa to take care of this character
set conversion mess..  Full details available to those that are
interested..


In Script_OnStart we convert submitted data to utf8

  ...
  #set $Apps::Param to form data or querystring.

  # decide on character set based on submitted form data element
  # 'asp_charset', or based on user's language.

  my $charset = $Apps::Param->{'asp_charset'};
  $charset = 'x-euc-jp' if (!$charset && $Session->{"Lang"} eq 'ja');

  $charset ||= 'iso-8859-1';

  # Convert japanese to UTF8
  ... messy Jcode stuff removed..
  # convert utf8 
   ; # no-op
  # convert iso8859-1 to utf8
  ... messy Unicode::String code..

  $Response->{Charset} = $charset;


In Script_OnFlush we convert the internal utf8 data to the target charset

  my $charset = $Response->{Charset};

  # do character set conversion..
  if ($charset eq 'x-euc-jp') {
    ... messy Jcode stuff
  } elsif ($charset eq 'iso-8859-1') {
    ... unicode::string stuff here.
  }

  # here's the tricky part:
  # Automatically add hidden charset fields to forms?
  $$data =~ s,(<form.*</form>),formfixer($1),sige;


Here's the formfixer thing, it adds hidden charset values to the form:

sub formfixer {
    my $form = shift;
    return($form) if ($form =~ /action="?http/);
    $form =~ s,</form>,<input type="hidden" name="asp_charset" 
value="$Response->{Charset}"></form>,si;
    return($form);
}



-- 
Paul Lindner
[EMAIL PROTECTED]
Red Hat Inc.

Reply via email to