https://bz.apache.org/bugzilla/show_bug.cgi?id=63462
Bug ID: 63462 Summary: Problems with MAPIMessage.guess7BitEncoding/MAPIMessage.getHtmlBody Product: POI Version: unspecified Hardware: All OS: All Status: NEW Severity: normal Priority: P2 Component: HSMF Assignee: dev@poi.apache.org Reporter: dominik.hoe...@fabasoft.com Target Milestone: --- Created attachment 36597 --> https://bz.apache.org/bugzilla/attachment.cgi?id=36597&action=edit Example MSG files with different code pages Some E-Mails run into encoding problems when reading the subject, text body or html body and using MAPIMessage.guess7BitEncoding. Example: E-Mail defines PR_INTERNET_CPID -> UTF-8, PR_MESSAGE_LOCALE_ID -> 1031, PR_MESSAGE_CODEPAGE -> undefined, no headers. * Outlook wants PR_SUBJECT to be CP1252 (as PR_INTERNET_CPID is only for PR_BODY and PR_BODY_HTML; currently read as UTF-8 as guess7BitEncoding sets this) * Outlook wants binary PR_BODY_HTML to be UTF-8 (Would currently read as CP1252, as getBodyHtml does not take care of any code page in case it is binary) * Outlook wants ASCII PR_BODY_HTML to be UTF-8 (Currently correct) * Outlook wants PR_BODY to be CP1252 for an unknown reason (Would currently read as UTF-8, as guess7BitEncoding sets this) In the docs PR_INTERNET_CPID may only be used to indicate the code page for PR_BODY and PR_BODY_HTML: https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtaginternetcodepage-canonical-property In my tests Outlook never looks at the charset information inside the HTML; it only relies on PR_INTERNET_CPID. In case of PR_MESSAGE_CODEPAGE is undefined, and no headers are present, using the default ANSI codepage for the locale defined by PR_MESSAGE_LOCALE_ID may be the only hint to get the correct code page, as PR_INTERNET_CPID is only for text/html body. Suggestion: https://github.com/apache/poi/pull/149 (With this patch all existing Unit-Tests succeed without modification) Attachments: MSG-Files where the text body and html body should be decoded correctly. Outlook displays them as expected. Regards, Dominik -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org