[
https://issues.apache.org/jira/browse/TIKA-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581955#comment-15581955
]
Tim Allison commented on TIKA-2122:
-----------------------------------
We'll also have to start adding handling for encoding in headers:
{noformat}
H: From: =?iso-8859-1?Q?L'=C9quipe_Microsoft_Outlook_Express?=
<[email protected]>
H: To: "Nouvel utilisateur de Outlook Express"
H: Subject: Microsoft Outlook Express 6
H: Date: Thu, 5 Apr 2007 09:26:06 -0700
H: MIME-Version: 1.0
H: Content-Type: text/html;
H: charset="iso-8859-1"
H: Content-Transfer-Encoding: quoted-printable
H: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
{noformat}
> Extract all email headers from Outlook .msg files into Metadata
> ---------------------------------------------------------------
>
> Key: TIKA-2122
> URL: https://issues.apache.org/jira/browse/TIKA-2122
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.13
> Reporter: Chris Knott
> Priority: Minor
> Fix For: 2.0, 1.14
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Currently most email headers are not added to the Metadata when extracting
> Outlook .msg files.
> http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
> The headers - {{msg.getHeaders()}} - are already being looped through as a
> way to estimate the date.
> All headers should be added to Metadata, using the name of the header with a
> prefix such as {{"raw-header:"}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)