[
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168945#comment-15168945
]
Tim Allison edited comment on TIKA-1865 at 2/26/16 1:17 PM:
------------------------------------------------------------
With the handful of MSG files in our "test-documents", I get this:
{noformat}
test-outlook2003.msg
emailFromChunk:[email protected]
header_from:null
testMSG.msg
emailFromChunk:[email protected]
header_from:From: Jukka Zitting <[email protected]>
testMSG_att_doc.msg
emailFromChunk:[email protected]
header_from:null
testMSG_att_msg.msg
emailFromChunk:/O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE
GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH
header_from:From: Nick Booth <[email protected]>
testMSG_chinese.msg
emailFromChunk:/O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG
header_from:null
testMSG_forwarded.msg
emailFromChunk:/O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP
(FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE
header_from:From: Paul Allan Hill <[email protected]>
{noformat}
Perhaps a strategy of try emailFromChunk and then back off to a regex on the
header {{From}} if that's there? That would get a "regular" email address from
the above except for {{testMSG_chinese.msg}}. Or, is the exchange info useful
to you if that's all we can get, as well?
was (Author: [email protected]):
With the handful of MSG files in our "test-documents", I get this:
{noformat}
test-outlook2003.msg : [email protected]
testMSG.msg : [email protected]
testMSG_att_doc.msg : [email protected]
testMSG_att_msg.msg : /O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE
ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH
testMSG_chinese.msg : /O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG
testMSG_forwarded.msg : /O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP
(FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE
{noformat}
> Save sender email address in Outlook MSG metadata
> -------------------------------------------------
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
> Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files.
> Currently only sender name is extracted. That is an important information to
> be extracted for search engines.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)