[ 
https://issues.apache.org/jira/browse/TIKA-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277257#comment-17277257
 ] 

Nick Burch commented on TIKA-3290:
----------------------------------

We did some work fairly recently to increase the chances of real emails with 
unusual headers being correctly detected, eg TIKA-3106 and TIKA-2594. I suspect 
one of those is behind the change in detection. (We now look further into the 
file for 2+ valid email headers, to deal with files with large non-standard 
headers first)

The file does look, to me at least, like a bunch of emails where the first one 
is missing its headers. Maybe we need to make the mail parser more forgiving 
for cases like this, where there's text before the first "real" email?

Paging our email handling experts [~lfcnassif] [~tallison] !

> Extension reading it as eml instead of txt
> ------------------------------------------
>
>                 Key: TIKA-3290
>                 URL: https://issues.apache.org/jira/browse/TIKA-3290
>             Project: Tika
>          Issue Type: Bug
>          Components: core, mime
>    Affects Versions: 1.25
>            Reporter: Vamsi Molli
>            Priority: Major
>              Labels: tika-parsers
>             Fix For: 1.24.1
>
>         Attachments: test_sample_message.txt
>
>
> The attached file extension is reading it as eml instead of txt. With version 
> 1.24.1 it is reading it as txt and now with the upgrade to 1.25, it is 
> reading it as eml. So that while parsing we are getting mail corrupted error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to