[ 
https://issues.apache.org/jira/browse/TIKA-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534266#comment-15534266
 ] 

Nick Burch commented on TIKA-2099:
----------------------------------

This patch removes some special handling put in place for COMPRESS-117. Would 
someone be able to check the age of that compress workaround commit, compared 
to the underlying commons compress fix, to have an idea of if it's safe to 
remove that or not?

In terms of a unit test, 
{{tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java}} is 
probably the best place to pop one

> Tar files without magic bytes are sporadically detected as text
> ---------------------------------------------------------------
>
>                 Key: TIKA-2099
>                 URL: https://issues.apache.org/jira/browse/TIKA-2099
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.11
>            Reporter: Robin Schimpf
>
> When a tar is created with 7 Zip 9.20 the magic bytes "ustar" are not added. 
> Everything seems to work file if the tar contains Microsoft Office files. But 
> when only text files are contained Tika sporadically recognices it as 
> text/plain. It also seems to depend on the size of the first file in the tar. 
> This has to be several KB big.
> The problem was found in version 1.11 and also exists in the latest 
> 1.14-SNAPSHOT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to