On Thu, 20 Sep 2001, Toni Vila wrote:

> We have recently upgraded our AOLserver 3.0 to AOLServer 3.4 due to the
> Auth problem.
>
> But now we notice some problems with some characters (accents), which on
> the previous version did not appear.
>
> In our Oracle 8.1.6 database, we have the following text: "Cerámica
> Castellón", which should be "Cerámica Castellón".
>
> In AOLServer v3.0, it was displayed on the web correctly ("Cerámica
> Castellón"), but in AOLServer 3.4, it's just "Cerámica Castellón", and we
> don't know how to translate it back to the correct charset.
>
> It seems to me a Multibyte charset, as maybe Unicode, but although choosing
> a ORA_NLS_LANG of "_.UTF8" or similar, it doesn't change anything.
>
> I haven't changed any line in our configuration file, nor in the database
> between versions.
>
> Do you know what may have happened or how to correct it?

I have very similar problems using different version of AOLserver 3.x so
here is my advice.

If you want AOLserver to correctly work with characters in encoding other
than ASCII you have to be very carefull and use proper encoding while:

1. Talking to your database.
2. Sourcing your TCL libraries at server startup.
3. Reading your ADP pages.
4. Writing data to browser from ADP pages.
5. Reading your TCL pages.
6. Writing data to browser from TCL pages.
7. Reading form data from browser.
8. URL-encoding/decoding data.
9. Talking to your system.

For general explanation see ArsDigita guide "Building a
Multilingual Web Service Using ACS"
(http://www.arsdigita.com/asj/multilingual/) and more closely
look at chapter "Character Set Encoding"
(http://www.arsdigita.com/asj/multilingual/encoding.adp).

Keep in mind that solutions described there will work only with
ArsDigita version of AOLserver (last is AOLserver-3.3.1+ad13 at
http://www.arsdigita.com/acs-repository/older) which probably will be
no longer maintained. In AOLserver 3.4 character encoding is solved
diffrently.

In general in TCL scripts you have to operate on strings encoded in
UTF-8, because Tcl 8.x is designed that way. So you have to carefully
encode strings read into or written from TCL scripts. My examples use
ISO-8859-2 encoding which is proper for Polish language, but you
should changed it to something appropriate for you.

Ad. 1 Talking to your database

First you have to ensure that AOLserver as a database client writes and reads
data from database in UTF-8 (this concerns both ArsDigita Oracle Driver and
Postgres Driver).

For Oracle I used the following Bash script
export ORACLE_BASE=/opt/oracle
export ORACLE_HOME=$ORACLE_BASE/product/8.1.6
export ORACLE_TERM=vt100
export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data
export NLS_LANG='POLISH_POLAND.UTF8'
export NLS_DATE_FORMAT="YYYY-MM-DD"
export ORAENV_ASK=NO

For Postgres simply set in enviroment PGCLIENTENCODING=unicode

If your database is not in UTF-8 you should be aware of character
expansion problem - look for my patch included in Oracle Driver 2.6
packaged in ArsDigita AOlserver 3.3.1+ad13 at
http://www.arsdigita.com/acs-repository/older.

Ad. 2 & 9 Reading TCL libraries and talking to your system

TCL libraries will be sourced poperly if you set system encoding.

I put file in my TCL library which is sourced first:

tcl/1-encoding.tcl:

encoding system iso8859-2
ns_log notice "encoding system : [encoding system]"

System encoding is also used when TCL communicates with your operating
system (eg. sockets - ns_httpget).

The following solutions/problems are specific to AOLserver 3.4:

Ad. 3 Reading ADP pages

You can configure ADP pages (specify file extention)
to be sourced in given encoding by including in config script the
following lines:

ns_section "ns/encodings"
ns_param "adp" iso8859-2
ns_param "htm" iso8859-2

Ad. 4 Writing data to browser from ADP pages

Translation from UTF-8 to proper encoding while writing output from
ADP pages could be configured by setting appropriate charset in mime type
definition:

ns_section "ns/mimetypes"
ns_param   ".htm"          "text/html; charset=iso-8859-2"
ns_param   ".adp"          "text/html; charset=iso-8859-2"


Ad. 5 & 6 Reading TCL pages and writing data from TCL pages to browser

I personaly don't use TCL pages. But if you have to than you may:

- patch modules/tcl/file.tcl to read TCL pages in given encoding.

- patch AOLserver the way that ArsDigita did in its version of
  AOLserver where you were able to specify encoding to be used for
  data written from TCL pages (ns_return/ns_write commands)

Ad. 7 Reading form data from browser

While reading HTML form data from browser you should translate string data
to UTF-8 like this:

set s [encoding convertfrom iso8859-2 [ns_queryget data_in_iso8859-2]]

Ad. 8 URL-encoding/decoding

In AOLserver 3.4 ns_urlencode/ns_urldecode work on binary data so it
encodes characters in URLs in UTF-8 and that is inproper if you read
other form data in other encoding.

You may stop using ns_urlencode which sometimes works for me or patch
ns_urlencode the way ArsDigita did that.

This lenghty answer is to provide more information to the community on
using AOLserver with encoding other that ASCII. If you use ISO-8859-1
some defaults in AOLserver 3.4 are set properly. Anyway it's good to
understand these issues because if you make one mistake you will get
garbage.

If you don't use pure AOLserver, but for example ACS with its Request
Processor, that handles encoding issues by itself, my advice will be not
very usefull for you.

--tkosiak

Reply via email to