Hello,

I posted a few days ago regarding some problems I was having using
data received by AOLServer via form submission in certain charsets.

I've been looking into this problem pretty intensively, and it looks
like AOLServer is actually performing some sort of translation on data
retrieved from [ ns_conn form ] that is causing that data to become
corrupted.  This applies to data in many non-Latin1, non-Unicode
charset encodings.

Here is an example:

I have a form submission page which sends data to AOLServer using
Shift-JIS encoding.  Here is an "od -x" of the valid input data:

0000000 528e 4181 e591 528e 4281 5693 4181 c290
0000020 a282 5693 4281 6c90 4181 e28e b582 a282
0000040 6c90 4281 b993 4181 b396 c882 b993 4281
0000060 0a0d

When AOLServer receives this data, it somehow corrupts it.  If we use a
binary configured filehandle to ship the raw bytes received by
AOLServer back out to disk, we get the following "od" results:

0000000 528e 4181 e591 528e 4281 5693 4181 8290
0000020 93a2 8156 9042 816c 8e41 82b5 90a2 816c
0000040 9342 81b9 9641 82b3 b913 4281 0a20
0000056

As you can see, the data is valid up to the first 14 bytes, and
corrupted thereafter.

If you'd like to download the valid Shift-JIS data, it can be found
(wrapped in html) at http://www.nevernever.org/~thomas/shiftjis.html


If we use [ encoding converfrom shiftjis ] to convert this data to
Unicode, then write the converted data out to disk using a
Unicode-configured filehandle, we get the following data:

0000000 5c71 3001 5927 5c71 3002 5929 3001 5782
0000020 ff62 5929 3002 4eba 3001 4e03 3044 4eba
0000040 3002 9053 3001 7121 0082 0013 ff79 3002
0000060 0020 000a

Compare this to the *actual* Unicode representation of this data:

0000000 5c71 3001 5927 5c71 3002 5929 3001 9752
0000020 3044 5929 3002 4eba 3001 5bc2 3057 3044
0000040 4eba 3002 9053 3001 7121 306a 9053 3002
0000060 000a

Again, after the first 14 bytes, we have corrupted data.  The same type
of corruption happens with EUC-JP.  It appears to also happen with
EUC-KR, ks_c_5601-1987, Big5 and several others, although this is just
from a visual test.  I didn't perform od's of these encodings.

Interestingly enough, there appear to be some exceptions to this type
of data corruption.  If you send Unicode-encoded data to AOLServer, it
is interpreted and received without errors.  The same holds true for
ISO-8859-1, ISO-2022-JP and ISO-2022-KR (this is not an exhautive
list).

Also interesting - if you take the corrupted data (Shift-JIS, for
example) and write it out to the browser using ns_return or ns_write,
it renders perfectly in the browser.  This implies that the output
method is performing the inverse of whatever corruption [ ns_conn form
] is causing.

We can verify this in a simple manner:

Construct an .adp that reads a Shift-JIS text file using a
binary-configured filehandle and output this data without conversion
using ns_write or ns_return.  Since some sort of transformation is
being applied to the outgoing data, the output will not be readable in
the browser window using Shift-JIS encoding.  The example text is shown
with a small Katakana "tsu" between most characters, and some totally
incorrect characters.

We can also verify that Unicode data is not affected.  If we modify our
.adp to read our Shift-JIS text file using a shiftjis configured
filehandle (thus converting the Shift-JIS data to Unicode in the Tcl
core), this data can be sent out to the client browser and rendered
perfectly.

This becomes a problem if we want to do anything with user data other
than display it using AOLServer.  I haven't been able to figure out
exactly what transformations are being applied to the data - does
anybody have an idea?  I've tested this on AOLServer 3.4p2 as well as
AOLServer 4, with the same results.  Both installations are on Linux.

If anybody could help shed some light on this issue, I would definitely
appreciate it!

thomas park



__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Mother's Day is May 12th!
http://shopping.yahoo.com

Reply via email to