Tim Barrett created TIKA-1665:
---------------------------------

             Summary: Incorrect handling of eml files with type message/x-emlx 
embedded in msg files
                 Key: TIKA-1665
                 URL: https://issues.apache.org/jira/browse/TIKA-1665
             Project: Tika
          Issue Type: Bug
          Components: mime, parser
    Affects Versions: 1.9, 1.8, 1.7
         Environment: all (Linux, Os-X, Windows)
            Reporter: Tim Barrett


Our software uses Tika to parse large and diverse sets of customer files. 
Amongst these files we have eml files which are embedded within msg files. 
These eml files have a media type of message/x-emlx as detected by Media 
Detector.

>From Tika 1.7 onwards the binary mime attachment data of the file within the 
>parent msg file is parsed as text, this did not happen with Tika 1.6 or prior 
>versions. This is causing huge volumes of meaningless characters to be passed 
>through to our content handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to