[
https://issues.apache.org/jira/browse/TIKA-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Barrett updated TIKA-1665:
------------------------------
Attachment: Anonymised Small EML attached message.msg
This is a msg file with an embedded eml message which exhibits this behaviour
> Incorrect handling of eml files with type message/x-emlx embedded in msg files
> ------------------------------------------------------------------------------
>
> Key: TIKA-1665
> URL: https://issues.apache.org/jira/browse/TIKA-1665
> Project: Tika
> Issue Type: Bug
> Components: mime, parser
> Affects Versions: 1.7, 1.8, 1.9
> Environment: all (Linux, Os-X, Windows)
> Reporter: Tim Barrett
> Attachments: Anonymised Small EML attached message.msg
>
>
> Our software uses Tika to parse large and diverse sets of customer files.
> Amongst these files we have eml files which are embedded within msg files.
> These eml files have a media type of message/x-emlx as detected by Media
> Detector.
> From Tika 1.7 onwards the binary mime attachment data of the file within the
> parent msg file is parsed as text, this did not happen with Tika 1.6 or prior
> versions. This is causing huge volumes of meaningless characters to be passed
> through to our content handler.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)