[
https://issues.apache.org/jira/browse/TIKA-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540703#comment-17540703
]
Hudson commented on TIKA-3771:
------------------------------
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #593 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/593/])
TIKA-3771: remove eml magic too common causing false positives (lfcnassif:
[https://github.com/apache/tika/commit/ed1c86a52d8e07d0d57decfe82ed73a90fb57c8e])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (edit) tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java
* (add) tika-core/src/test/resources/org/apache/tika/mime/test-pngNotEml.bin
> Regression from TIKA-3687: Files wrongly detected as EML
> ---------------------------------------------------------
>
> Key: TIKA-3771
> URL: https://issues.apache.org/jira/browse/TIKA-3771
> Project: Tika
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Luís Filipe Nassif
> Assignee: Luís Filipe Nassif
> Priority: Major
> Fix For: 2.4.1
>
> Attachments: BEA498353ECFA1C440365BB434BBC228269917D7.png
>
>
> Running regression tests in the process of upgrading to Tika-2.4.0 from 1.x,
> I detected some hundreds of samples from 1M of different file types now are
> being detected as EML. This is caused by the <match value="\nX-"
> type="string" offset="0:1024"/> rule added in TIKA-3687 in the
> minShouldMatch="2" clause. Attached is a sample PNG file that triggers this
> (it also has another \nDate: value in the first 1024 bytes).
> Another not related thing, I tried to override the message/rfc822 mime
> definition with a custom-tika-mimetypes.xml in classpath, but it had no
> effect. It used to work in Tika-1.x. Was that change intentional? I think
> user definitions should take precedence over Tika definitions, since they can
> change depending on domain or context (e.g. the same extension may be used by
> different applications). If it wasn't intentional, I'll open other issue.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)