[
https://issues.apache.org/jira/browse/TIKA-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581946#comment-15581946
]
Tim Allison commented on TIKA-2122:
-----------------------------------
Y, I think this is a really good idea with a prefix -- partly because it will
expose areas for further work in .msg, and as [~gagravarr] pointed out, we
still need some volunteer energy on other properties within .msg.
I suspect that folks interested in forensics would want both the raw headers
and the other properties we might eventually pull out.
For now, how about {{raw-email-header:}}?
As an example of "areas for further work", it looks like POI is breaking
headers on new lines or semi-colons? On one of our current test files, I've
prepended each header with "H:":
{noformat}
H: Microsoft Mail Internet Headers Version 2.0
H: Received: from hq-ex3fe3.ptcnet.ptc.com ([132.253.201.67]) by
HQ-MAIL3.ptcnet.ptc.com with Microsoft SMTPSVC(6.0.3790.3959);
H: Thu, 29 Jan 2009 14:17:10 -0500
H: Received: from irp1.ptc.com ([12.11.148.83]) by hq-ex3fe3.ptcnet.ptc.com
with Microsoft SMTPSVC(6.0.3790.3959);
H: Thu, 29 Jan 2009 14:17:10 -0500
H: X-IronPort-Anti-Spam-Filtered: true
H: X-IronPort-Anti-Spam-Result:
AskBALePgUmM0wsCk2dsb2JhbACMeYZdPwEBAQEJCQoJEQWpcoEDjWwBAwEDhA0G
H: X-IronPort-AV: E=Sophos;i="4.37,346,1231131600";
H: d="scan'208";a="51369639"
{noformat}
> Extract all email headers from Outlook .msg files into Metadata
> ---------------------------------------------------------------
>
> Key: TIKA-2122
> URL: https://issues.apache.org/jira/browse/TIKA-2122
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.13
> Reporter: Chris Knott
> Priority: Minor
> Fix For: 2.0, 1.14
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Currently most email headers are not added to the Metadata when extracting
> Outlook .msg files.
> http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
> The headers - {{msg.getHeaders()}} - are already being looped through as a
> way to estimate the date.
> All headers should be added to Metadata, using the name of the header with a
> prefix such as {{"raw-header:"}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)