[
https://issues.apache.org/jira/browse/TIKA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558431#comment-16558431
]
Ross Johnson edited comment on TIKA-2694 at 7/26/18 3:31 PM:
-------------------------------------------------------------
Just adding some extra info. I checked the attached .msg file, and indeed the
sender MAPI properties only contain the x500 sender address:
{code:java}
PidTagSenderName (0x0C1A) String (0x001F)
"Berger, Eric"
PidTagSenderAddressType (0x0C1E) String (0x001F)
"EX"
PidTagSenderEmailAddress (0x0C1F) String (0x001F)
"/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP
(FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"
{code}
However, the normal email addresses are present within the
PidTagTransportMessageHeaders property.
{code:java}
From: "Berger, Eric" <[email protected]>
{code}
It may be possible to use the information from PidTagTransportMessageHeaders as
a backup or alternative, but in my experience, resolving the header information
with the MAPI properties is a bit of a rabbit hole. Care must be taken to match
up the "From:" and "Sender:" headers with PidTagSender and
PidTagSentRepresenting properties which aren't 1:1, and furthermore there may
be multiple "From:" addresses whereas the MAPI properties will just store one
of them. I've also seen MSG files where the stored headers seem totally
unrelated to the stored MAPI properties, although this is (hopefully) a very
rare occurrence.
was (Author: rossj):
Just adding some extra info. I checked the attached .msg file, and indeed the
sender MAPI properties only contain the x500 sender address:
{code:java}
PidTagSenderName (0x0C1A) String (0x001F)
"Berger, Eric"
PidTagSenderAddressType (0x0C1E) String (0x001F)
"EX"
PidTagSenderEmailAddress (0x0C1F) String (0x001F)
"/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP
(FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"
{code}
However, the normal email addresses are present within the
PidTagTransportMessageHeaders property.
{code:java}
From: "Berger, Eric" <[email protected]>
{code}
It may be possible to use the information from PidTagTransportMessageHeaders as
a backup or alternative, but in my experience, resolving the header information
with the MAPI properties is a bit of a rabbit hole. Care must be taken to match
up the "From:" and "Sender:" headers with PidTagSender and
PidTagSentRepresenting properties which aren't 1:1, and furthermore there may
be multiple "From:" addresses whereas the MAPI properties will just store one
of them. I've also seen MSG files where the stored headers seem totally
unrelated to the stored MAPI properties, although this is (hopefully) a very
rare occurrence.
> "From" headers is not always extracted correctly on msg mails
> -------------------------------------------------------------
>
> Key: TIKA-2694
> URL: https://issues.apache.org/jira/browse/TIKA-2694
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.17
> Environment: CentOS 7
> Windows 10
> Reporter: Celpan Valeria
> Priority: Major
> Attachments: Fw Anime User Analysis.msg
>
>
> For some emails we get instead of the email address for "From" field a value
> which looks like `/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP
> (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER`.
> The issue seems to be connected to the library
> `org.apache.poi:poi-scratchpad:3.17` as when running
> `org.apache.tika.parser.microsoft.OutlookExtractor::OutlookExtractor(DirectoryNode,
> ParserContext)` we get `this.msg.mainChunks.allChunks.SenderEmailAddress =
> "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP
> (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"`.
> Check attachment to reproduce this defect.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)