On Sun, Dec 4, 2011 at 12:16 AM, Tom Hutchison <[email protected]> wrote: > Is it possible some malformed email could be causing a parsing error? What I > am getting at. If I have 250 emails in a folder, how is it the run on the > folder is writing 260. The extra ten being date and subject blank, > sometimes, and sometimes, with or without content.
Your problem is likely related to the following FAQ entry: http://www.mhonarc.org/MHonArc/doc/faq/archives.html#split > When the parser reads them, is it possible Mhonarc is picking up on > malformed reply quotes and thinks they are new emails within the actual > email? So instead of 4 emails in the above example, it thinks there are 6. > Garbage in, garbage out comes to mind. > > I did solve the broken HTML, not very efficently with Outlook 2010 as it > does allow for a striping of all HTML code by setting the open email to > “edit” then choosing “plain text” after you edit anything in the body of IIRC, Outlook allows a text/plain alternative to be generated along with the HTML part. You can use the MIMEALTPREFS resource, as noted in the FAQ, to give higher precedence to text/plain over text/html. > the email. Even if it is just a carriage return or a space. Close the email > and save on exit and the whole email is rewritten, stripping out all HTML > and resetting the header information to show “plain/text” and whatever you > have the encoding set to. Stripping out all HTML from the emails was the > only way I could think of to solve the unclosed <table> attribute in quite a > few emails which was causing problems with the msgxxx.html pages. > > It’s long past time for standardized header and html format for email. If > anything it might secure them more... text/enriched was created a long time ago to provide enhanced formatting of email messages, but it faded away when the Web grew and HTML became a defacto markup format for "enriched" text. IMO, it is inexcusable for major software/services organizations to generate such malformed HTML. Dealing with malicious HTML is one thing, but when non-malicious-generated HTML is so badly formatted (when it should not be) it makes the lives of consumers of such content much more difficult. --ewh
