[ 
https://issues.apache.org/jira/browse/TIKA-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Barrett updated TIKA-1665:
------------------------------
    Attachment: Anonymised Small EML attached message.msg

This is a msg file with an embedded eml message which exhibits this behaviour

> Incorrect handling of eml files with type message/x-emlx embedded in msg files
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-1665
>                 URL: https://issues.apache.org/jira/browse/TIKA-1665
>             Project: Tika
>          Issue Type: Bug
>          Components: mime, parser
>    Affects Versions: 1.7, 1.8, 1.9
>         Environment: all (Linux, Os-X, Windows)
>            Reporter: Tim Barrett
>         Attachments: Anonymised Small EML attached message.msg
>
>
> Our software uses Tika to parse large and diverse sets of customer files. 
> Amongst these files we have eml files which are embedded within msg files. 
> These eml files have a media type of message/x-emlx as detected by Media 
> Detector.
> From Tika 1.7 onwards the binary mime attachment data of the file within the 
> parent msg file is parsed as text, this did not happen with Tika 1.6 or prior 
> versions. This is causing huge volumes of meaningless characters to be passed 
> through to our content handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to