Hello, Andrew Morgan schrieb: > I tested this with Iceweasel (Firefox) 2.0.0.12 on Debian Unstable as the > client, and the latest stable releases of Horde and IMP with PHP5 on > Debian Etch as the server. The browser says the content-type is > text/plain when it uploads the attachment. Here is the exact attachment > that was sent in the email: > > --=_3uemsho7ppkw > Content-Type: text/plain; > charset=UTF-8; > name="unicode.txt" > Content-Disposition: attachment; > filename="unicode.txt" > Content-Transfer-Encoding: quoted-printable > > =FF=FEt=00h=00i=00s=00 =00i=00s=00 =00a=00 =00t=00e=00s=00t=00=0D=00 > =00i=00n=00 =00U=00T=00F=001=006=00=0D=00 > =00=0D=00 > =00P=00h=00i=00l=00i=00p=00 =00S=00t=00e=00e=00m=00a=00n=00=0D=00 > =00 > --=_3uemsho7ppkw-- > > It used quoted-printable encoding instead of Base64. I'm not a > quoted-printable whiz, but it appears that the high-order bits get encoded > as 00 (NUL) values. When I download this same attachment using IMP, it is > identical to your original unicode.txt file. However, I suspect > Thunderbird and Outlook are not combining the two bytes of data back > together (=FF=FE into FFEE) but are trying to render the NUL character.
The problem is the wrong charset specification: For an UTF-16 encoded text, it should, of course, read “UTF-16”, rather than “UTF-8”. I guess, that wrong specification stems from the browser used to upload that file. Because of that wrong specification, the adressee will not interpret the text as intended. In particular: - The individual bytes will not be assembled into 16-bit units. - Any bytes above 127 will be interpreted according to UTF-8 rules; in particular, the two leading bytes (meant as BOM) will be considered as illegal input values, and most probably be replaced with Replacement Characters U+FFFD. - In due course, the endianess of the UTF-16 text will be lost. That particular text is little-endian; the UTF-8 bytes will be interpreted in the opposite sequence. Hence, the two halfs of each 16-bit unit will effectievely be swapped, and even if you try to read the attachment as a UTF-16 file, you’ll be out of luck. The quoted-printable encoding is alright; the Content-Transfer-Encoding is totally irrelevant for the problems the two preceding posts in this thread have described. Good luck, Otto Stolz -- IMP mailing list - Join the hunt: http://horde.org/bounties/#imp Frequently Asked Questions: http://horde.org/faq/ To unsubscribe, mail: [EMAIL PROTECTED]
