On Mon, Feb 16, 2009 at 2:49 PM, Stefano Bagnara <[email protected]> wrote: > Markus Wiederkehr ha scritto: >> In my opinion this issue is closely related to MIME4J-112 and MIME4J-116. >> >> I think that in the course of MIME4J-116 we should (maybe) create >> Field instances in AbstractEntity instead of later on in >> MessageBuilder. A Field object could store the raw data in a byte[] >> instead of a String which would greatly help with MIME4J-112. >> >> The only problem is that the charset for a lenient parsing mode is not >> known at this early point. But considering your clarification about >> the lenient writing mode I wonder if anybody really needs a lenient >> parsing mode. (I wonder if anyone really needs a lenient writing mode >> for that matter.) > > Lenient Writing IMO is only needed if you need roundtrip. For > standard/most MIME4J usages I don't see why we should write malformed > data in output.
In my opinion Field should preserve the original bytes in a byte array. Writing a message could simply use these original bytes and there would be no roundtrip issues. Essentially there would be only one writing mode. In additional I would like to have a "visitor" or whatever that can be used to tidy up a message. > Lenient reading instead is part of being a generic parsing library: > most email clients correctly handle 8bit chars in the Subject header > because it happens than some email client writes them unencoded. If you > think mime4j could be used as the library for an email client it > probably still worth handling 8bit chars in the headers. > Of course there is no need to implement such a feature until someone > really ask/need it. My approach would still allow for that with a little overhead. If a ContentHandler receives a Field and that field contains the original raw bytes then nothing prevents the ContentHandler from parsing the fields again; using any charset determined by whatever means. Also structured fields are parsed lazily so the overhead would not be tremendous. > I don't really know nowadays how many email messages contains unencoded > headers. 10 years ago, when I checked this stuff deeply almost 40% of > international emails included unencoded headers. I expect this > percentage to be much less today, but I don't know if it is 10% or 0.1%. > > Stefano > >> So maybe AbstractEntity should simply use US-ASCII to decode the >> header fields without direct support for a lenient parsing mode that >> nobody needs. Then AbstractEntity can build Field instances and a >> ContentHandler receives those Field instances without having to parse >> them again. >> >> All in all I'm not sure if #118 should be addressed independently of >> 112 and 116 and whether 118 should be targeted for 0.6.. >> >> But those are just my 2 cents, >> >> Markus >> >> >> On Mon, Feb 16, 2009 at 1:27 PM, Oleg Kalnichevski (JIRA) >> <[email protected]> wrote: >>> [ >>> https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >>> ] >>> >>> Oleg Kalnichevski reassigned MIME4J-118: >>> ---------------------------------------- >>> >>> Assignee: oleg.kalnichevski >>> >>> Working on a patch >>> >>> Oleg >>> >>>> MIME stream parser handles non-ASCII fields incorrectly >>>> ------------------------------------------------------- >>>> >>>> Key: MIME4J-118 >>>> URL: https://issues.apache.org/jira/browse/MIME4J-118 >>>> Project: JAMES Mime4j >>>> Issue Type: Bug >>>> Reporter: Oleg Kalnichevski >>>> Assignee: oleg.kalnichevski >>>> Fix For: 0.6 >>>> >>>> >>>> Presently MIME stream parser handles non-ASCII fields incorrectly. Binary >>>> field content gets converted to its textual representation too early in >>>> the parsing process using simple byte to char cast. The decision about >>>> appropriate char encoding should be left up to individual ContentHandler >>>> implementations. >>>> Oleg >>> -- >>> This message is automatically generated by JIRA. >>> - >>> You can reply to this email to add a comment to the issue online. >>>
