I've tested the upload with all browsers I have - IE6 (windows XP) - IE7 (windows XP) - Firefox 2 (windows XP) - konqueror (Knoppix)
All gave the same wrong result. Philip Otto Stolz schreef: > Hello, > > Andrew Morgan schrieb: >> I tested this with Iceweasel (Firefox) 2.0.0.12 on Debian Unstable as >> the client, and the latest stable releases of Horde and IMP with PHP5 >> on Debian Etch as the server. The browser says the content-type is >> text/plain when it uploads the attachment. Here is the exact >> attachment that was sent in the email: >> >> --=_3uemsho7ppkw >> Content-Type: text/plain; >> charset=UTF-8; >> name="unicode.txt" >> Content-Disposition: attachment; >> filename="unicode.txt" >> Content-Transfer-Encoding: quoted-printable >> >> =FF=FEt=00h=00i=00s=00 =00i=00s=00 =00a=00 =00t=00e=00s=00t=00=0D=00 >> =00i=00n=00 =00U=00T=00F=001=006=00=0D=00 >> =00=0D=00 >> =00P=00h=00i=00l=00i=00p=00 =00S=00t=00e=00e=00m=00a=00n=00=0D=00 >> =00 >> --=_3uemsho7ppkw-- >> >> It used quoted-printable encoding instead of Base64. I'm not a >> quoted-printable whiz, but it appears that the high-order bits get >> encoded as 00 (NUL) values. When I download this same attachment >> using IMP, it is identical to your original unicode.txt file. >> However, I suspect Thunderbird and Outlook are not combining the two >> bytes of data back together (=FF=FE into FFEE) but are trying to >> render the NUL character. > > The problem is the wrong charset specification: For an UTF-16 encoded text, > it should, of course, read “UTF-16”, rather than “UTF-8”. I guess, that > wrong specification stems from the browser used to upload that file. > > Because of that wrong specification, the adressee will not interpret the > text as intended. In particular: > - The individual bytes will not be assembled into 16-bit units. > - Any bytes above 127 will be interpreted according to UTF-8 rules; > in particular, the two leading bytes (meant as BOM) will be considered > as illegal input values, and most probably be replaced with Replacement > Characters U+FFFD. > - In due course, the endianess of the UTF-16 text will be lost. > That particular text is little-endian; the UTF-8 bytes will be > interpreted in the opposite sequence. Hence, the two halfs of each > 16-bit unit will effectievely be swapped, and even if you try > to read the attachment as a UTF-16 file, you’ll be out of luck. > > The quoted-printable encoding is alright; the Content-Transfer-Encoding > is totally irrelevant for the problems the two preceding posts in this > thread have described. > > Good luck, > Otto Stolz > > > > > > -- IMP mailing list - Join the hunt: http://horde.org/bounties/#imp Frequently Asked Questions: http://horde.org/faq/ To unsubscribe, mail: [EMAIL PROTECTED]
