Celpan Valeria created TIKA-2694: ------------------------------------ Summary: "From" headers is not always extracted correctly on msg mails Key: TIKA-2694 URL: https://issues.apache.org/jira/browse/TIKA-2694 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.17 Environment: CentOS 7 Windows 10 Reporter: Celpan Valeria Attachments: Fw Anime User Analysis.msg
For some emails we get instead of the email address for "From" field a value which looks like `/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER`. The issue seems to be connected to the library `org.apache.poi:poi-scratchpad:3.17` as when running `org.apache.tika.parser.microsoft.OutlookExtractor::OutlookExtractor(DirectoryNode, ParserContext)` we get `this.msg.mainChunks.allChunks.SenderEmailAddress = "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"`. Check attachment to reproduce this defect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)