Hello, I posted a few days ago regarding some problems I was having using data received by AOLServer via form submission in certain charsets.
I've been looking into this problem pretty intensively, and it looks like AOLServer is actually performing some sort of translation on data retrieved from [ ns_conn form ] that is causing that data to become corrupted. This applies to data in many non-Latin1, non-Unicode charset encodings. Here is an example: I have a form submission page which sends data to AOLServer using Shift-JIS encoding. Here is an "od -x" of the valid input data: 0000000 528e 4181 e591 528e 4281 5693 4181 c290 0000020 a282 5693 4281 6c90 4181 e28e b582 a282 0000040 6c90 4281 b993 4181 b396 c882 b993 4281 0000060 0a0d When AOLServer receives this data, it somehow corrupts it. If we use a binary configured filehandle to ship the raw bytes received by AOLServer back out to disk, we get the following "od" results: 0000000 528e 4181 e591 528e 4281 5693 4181 8290 0000020 93a2 8156 9042 816c 8e41 82b5 90a2 816c 0000040 9342 81b9 9641 82b3 b913 4281 0a20 0000056 As you can see, the data is valid up to the first 14 bytes, and corrupted thereafter. If you'd like to download the valid Shift-JIS data, it can be found (wrapped in html) at http://www.nevernever.org/~thomas/shiftjis.html If we use [ encoding converfrom shiftjis ] to convert this data to Unicode, then write the converted data out to disk using a Unicode-configured filehandle, we get the following data: 0000000 5c71 3001 5927 5c71 3002 5929 3001 5782 0000020 ff62 5929 3002 4eba 3001 4e03 3044 4eba 0000040 3002 9053 3001 7121 0082 0013 ff79 3002 0000060 0020 000a Compare this to the *actual* Unicode representation of this data: 0000000 5c71 3001 5927 5c71 3002 5929 3001 9752 0000020 3044 5929 3002 4eba 3001 5bc2 3057 3044 0000040 4eba 3002 9053 3001 7121 306a 9053 3002 0000060 000a Again, after the first 14 bytes, we have corrupted data. The same type of corruption happens with EUC-JP. It appears to also happen with EUC-KR, ks_c_5601-1987, Big5 and several others, although this is just from a visual test. I didn't perform od's of these encodings. Interestingly enough, there appear to be some exceptions to this type of data corruption. If you send Unicode-encoded data to AOLServer, it is interpreted and received without errors. The same holds true for ISO-8859-1, ISO-2022-JP and ISO-2022-KR (this is not an exhautive list). Also interesting - if you take the corrupted data (Shift-JIS, for example) and write it out to the browser using ns_return or ns_write, it renders perfectly in the browser. This implies that the output method is performing the inverse of whatever corruption [ ns_conn form ] is causing. We can verify this in a simple manner: Construct an .adp that reads a Shift-JIS text file using a binary-configured filehandle and output this data without conversion using ns_write or ns_return. Since some sort of transformation is being applied to the outgoing data, the output will not be readable in the browser window using Shift-JIS encoding. The example text is shown with a small Katakana "tsu" between most characters, and some totally incorrect characters. We can also verify that Unicode data is not affected. If we modify our .adp to read our Shift-JIS text file using a shiftjis configured filehandle (thus converting the Shift-JIS data to Unicode in the Tcl core), this data can be sent out to the client browser and rendered perfectly. This becomes a problem if we want to do anything with user data other than display it using AOLServer. I haven't been able to figure out exactly what transformations are being applied to the data - does anybody have an idea? I've tested this on AOLServer 3.4p2 as well as AOLServer 4, with the same results. Both installations are on Linux. If anybody could help shed some light on this issue, I would definitely appreciate it! thomas park __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Mother's Day is May 12th! http://shopping.yahoo.com
