Nick Burch commented on TIKA-2122:

I'm not sure if we want to be dumping these raw into the Tika metadata - maybe 
we could do with a prefix though? (Would probably want syncing up with RFC822 
and MBox parsers though for consistency)

Also note that HMEF doesn't currently pull out all the possible properties from 
the MSG level (support for fixed-length properties is incomplete and in need of 
volunteer energy), so there may be more bits of metadata we could get from the 
MSG file "properly", which may negate some of the need for this. (Pending 
suitable POI work!)

> Extract all email headers from Outlook .msg files into Metadata
> ---------------------------------------------------------------
>                 Key: TIKA-2122
>                 URL: https://issues.apache.org/jira/browse/TIKA-2122
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.13
>            Reporter: Chris Knott
>            Priority: Minor
>             Fix For: 2.0, 1.14
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> Currently most email headers are not added to the Metadata when extracting 
> Outlook .msg files.
> http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
> The headers - {{msg.getHeaders()}} - are already being looped through as a 
> way to estimate the date.
> All headers should be added to Metadata, using the name of the header with a 
> prefix such as {{"raw-header:"}}

This message was sent by Atlassian JIRA

Reply via email to