Re: Decode UTF-8 (and other) strings

Shawn Walker Thu, 17 Jun 2004 12:31:15 -0700

On Thu, 17 Jun 2004 11:49:46 -0700 (PDT), Mark Crispin <[EMAIL PROTECTED]> wrote:

On Thu, 17 Jun 2004, Shawn Walker wrote:
I'm trying to use the utf8_mime2text() to convert a UTF-8 string to text. The sample string that I'm trying to convert is: =?UTF-8?Q?Us=C3=A9r T=C3=A9st?=
That is not a proper MIME quoted-word, and consequent utf8_mime2text() declines to deal with it.

What is the correct method of decoding the strings?
That string is properly MIME decoded as the text
        =?UTF-8?Q?Us=C3=A9r T=C3=A9st?=
A proper MIME quoted-word, such as
        =?UTF-8?Q?Us=C3=A9r_T=C3=A9st?=
would be decoded into a string with 8-bit characters.
The syntax for MIME quoted words was very carefully selected so that there would be no mistaken decodings. It is necessary to consider the effects of RFC 2822 line wrapping with spaces. Consequently, spaces are forbidden within quoted words.

I understand the argument of "why aren't you more forgiving in an obvious case such as this?". The answer is that you should take a look at Outlook. Many of the exploits in Outlook involve attacking Outlook's willingness to "just work" in "obvious cases." Without strict rules to follow, software is deprived of clear guidelines to follow that will enable it to reject absurd cases. To make matters worse, the code paths to forgive such "obvious forgivable cases" is rarely exercised; most data complies with the rules. This creates a fertile breeding ground for bugs.

I finally figured out that quoted string is invalid. If it's invalid, I say "tough", blame it on the person that formed a invalid quoted string.

Re: Decode UTF-8 (and other) strings

Reply via email to