Re: [jira] Assigned: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Markus Wiederkehr Mon, 16 Feb 2009 06:13:25 -0800

On Mon, Feb 16, 2009 at 2:49 PM, Stefano Bagnara <[email protected]> wrote:
> Markus Wiederkehr ha scritto:
>> In my opinion this issue is closely related to MIME4J-112 and MIME4J-116.
>>
>> I think that in the course of MIME4J-116 we should (maybe) create
>> Field instances in AbstractEntity instead of later on in
>> MessageBuilder. A Field object could store the raw data in a byte[]
>> instead of a String which would greatly help with MIME4J-112.
>>
>> The only problem is that the charset for a lenient parsing mode is not
>> known at this early point. But considering your clarification about
>> the lenient writing mode I wonder if anybody really needs a lenient
>> parsing mode. (I wonder if anyone really needs a lenient writing mode
>> for that matter.)
>
> Lenient Writing IMO is only needed if you need roundtrip. For
> standard/most MIME4J usages I don't see why we should write malformed
> data in output.


In my opinion Field should preserve the original bytes in a byte
array. Writing a message could simply use these original bytes and
there would be no roundtrip issues. Essentially there would be only
one writing mode.

In additional I would like to have a "visitor" or whatever that can be
used to tidy up a message.

> Lenient reading instead is part of  being a generic parsing library:
> most email clients correctly handle 8bit chars in the Subject header
> because it happens than some email client writes them unencoded. If you
> think mime4j could be used as the library for an email client it
> probably still worth handling 8bit chars in the headers.
> Of course there is no need to implement such a feature until someone
> really ask/need it.

My approach would still allow for that with a little overhead. If a
ContentHandler receives a Field and that field contains the original
raw bytes then nothing prevents the ContentHandler from parsing the
fields again; using any charset determined by whatever means. Also
structured fields are parsed lazily so the overhead would not be
tremendous.

> I don't really know nowadays how many email messages contains unencoded
> headers. 10 years ago, when I checked this stuff deeply almost 40% of
> international emails included unencoded headers. I expect this
> percentage to be much less today, but I don't know if it is 10% or 0.1%.
>
> Stefano
>
>> So maybe AbstractEntity should simply use US-ASCII to decode the
>> header fields without direct support for a lenient parsing mode that
>> nobody needs. Then AbstractEntity can build Field instances and a
>> ContentHandler receives those Field instances without having to parse
>> them again.
>>
>> All in all I'm not sure if #118 should be addressed independently of
>> 112 and 116 and whether 118 should be targeted for 0.6..
>>
>> But those are just my 2 cents,
>>
>> Markus
>>
>>
>> On Mon, Feb 16, 2009 at 1:27 PM, Oleg Kalnichevski (JIRA)
>> <[email protected]> wrote:
>>>     [ 
>>> https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>  ]
>>>
>>> Oleg Kalnichevski reassigned MIME4J-118:
>>> ----------------------------------------
>>>
>>>    Assignee: oleg.kalnichevski
>>>
>>> Working on a patch
>>>
>>> Oleg
>>>
>>>> MIME stream parser handles non-ASCII fields incorrectly
>>>> -------------------------------------------------------
>>>>
>>>>                 Key: MIME4J-118
>>>>                 URL: https://issues.apache.org/jira/browse/MIME4J-118
>>>>             Project: JAMES Mime4j
>>>>          Issue Type: Bug
>>>>            Reporter: Oleg Kalnichevski
>>>>            Assignee: oleg.kalnichevski
>>>>             Fix For: 0.6
>>>>
>>>>
>>>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary 
>>>> field content gets converted to its textual representation too early in 
>>>> the parsing process using simple byte to char cast. The decision about 
>>>> appropriate char encoding should be left up to individual ContentHandler 
>>>> implementations.
>>>> Oleg
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>

Re: [jira] Assigned: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly

Reply via email to