On Thu, 20 Sep 2001, Toni Vila wrote: > We have recently upgraded our AOLserver 3.0 to AOLServer 3.4 due to the > Auth problem. > > But now we notice some problems with some characters (accents), which on > the previous version did not appear. > > In our Oracle 8.1.6 database, we have the following text: "Cerámica > Castellón", which should be "Cerámica Castellón". > > In AOLServer v3.0, it was displayed on the web correctly ("Cerámica > Castellón"), but in AOLServer 3.4, it's just "Cerámica Castellón", and we > don't know how to translate it back to the correct charset. > > It seems to me a Multibyte charset, as maybe Unicode, but although choosing > a ORA_NLS_LANG of "_.UTF8" or similar, it doesn't change anything. > > I haven't changed any line in our configuration file, nor in the database > between versions. > > Do you know what may have happened or how to correct it? I have very similar problems using different version of AOLserver 3.x so here is my advice. If you want AOLserver to correctly work with characters in encoding other than ASCII you have to be very carefull and use proper encoding while: 1. Talking to your database. 2. Sourcing your TCL libraries at server startup. 3. Reading your ADP pages. 4. Writing data to browser from ADP pages. 5. Reading your TCL pages. 6. Writing data to browser from TCL pages. 7. Reading form data from browser. 8. URL-encoding/decoding data. 9. Talking to your system. For general explanation see ArsDigita guide "Building a Multilingual Web Service Using ACS" (http://www.arsdigita.com/asj/multilingual/) and more closely look at chapter "Character Set Encoding" (http://www.arsdigita.com/asj/multilingual/encoding.adp). Keep in mind that solutions described there will work only with ArsDigita version of AOLserver (last is AOLserver-3.3.1+ad13 at http://www.arsdigita.com/acs-repository/older) which probably will be no longer maintained. In AOLserver 3.4 character encoding is solved diffrently. In general in TCL scripts you have to operate on strings encoded in UTF-8, because Tcl 8.x is designed that way. So you have to carefully encode strings read into or written from TCL scripts. My examples use ISO-8859-2 encoding which is proper for Polish language, but you should changed it to something appropriate for you. Ad. 1 Talking to your database First you have to ensure that AOLserver as a database client writes and reads data from database in UTF-8 (this concerns both ArsDigita Oracle Driver and Postgres Driver). For Oracle I used the following Bash script export ORACLE_BASE=/opt/oracle export ORACLE_HOME=$ORACLE_BASE/product/8.1.6 export ORACLE_TERM=vt100 export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data export NLS_LANG='POLISH_POLAND.UTF8' export NLS_DATE_FORMAT="YYYY-MM-DD" export ORAENV_ASK=NO For Postgres simply set in enviroment PGCLIENTENCODING=unicode If your database is not in UTF-8 you should be aware of character expansion problem - look for my patch included in Oracle Driver 2.6 packaged in ArsDigita AOlserver 3.3.1+ad13 at http://www.arsdigita.com/acs-repository/older. Ad. 2 & 9 Reading TCL libraries and talking to your system TCL libraries will be sourced poperly if you set system encoding. I put file in my TCL library which is sourced first: tcl/1-encoding.tcl: encoding system iso8859-2 ns_log notice "encoding system : [encoding system]" System encoding is also used when TCL communicates with your operating system (eg. sockets - ns_httpget). The following solutions/problems are specific to AOLserver 3.4: Ad. 3 Reading ADP pages You can configure ADP pages (specify file extention) to be sourced in given encoding by including in config script the following lines: ns_section "ns/encodings" ns_param "adp" iso8859-2 ns_param "htm" iso8859-2 Ad. 4 Writing data to browser from ADP pages Translation from UTF-8 to proper encoding while writing output from ADP pages could be configured by setting appropriate charset in mime type definition: ns_section "ns/mimetypes" ns_param ".htm" "text/html; charset=iso-8859-2" ns_param ".adp" "text/html; charset=iso-8859-2" Ad. 5 & 6 Reading TCL pages and writing data from TCL pages to browser I personaly don't use TCL pages. But if you have to than you may: - patch modules/tcl/file.tcl to read TCL pages in given encoding. - patch AOLserver the way that ArsDigita did in its version of AOLserver where you were able to specify encoding to be used for data written from TCL pages (ns_return/ns_write commands) Ad. 7 Reading form data from browser While reading HTML form data from browser you should translate string data to UTF-8 like this: set s [encoding convertfrom iso8859-2 [ns_queryget data_in_iso8859-2]] Ad. 8 URL-encoding/decoding In AOLserver 3.4 ns_urlencode/ns_urldecode work on binary data so it encodes characters in URLs in UTF-8 and that is inproper if you read other form data in other encoding. You may stop using ns_urlencode which sometimes works for me or patch ns_urlencode the way ArsDigita did that. This lenghty answer is to provide more information to the community on using AOLserver with encoding other that ASCII. If you use ISO-8859-1 some defaults in AOLserver 3.4 are set properly. Anyway it's good to understand these issues because if you make one mistake you will get garbage. If you don't use pure AOLserver, but for example ACS with its Request Processor, that handles encoding issues by itself, my advice will be not very usefull for you. --tkosiak