[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337531#comment-16337531 ]
Andreas Meier commented on TIKA-2527: ------------------------------------- I attached a patch to address the mentioned problems. Furthermore I added three new MIMEType sections for application/x-lz4, Image/x-tga and audio/x-caf. The Image/x-tga part had to be placed in front of the application/x-123 mime-type recognition, because the starting bytes overlap in some cases. The important part of the Image/x-tga recognition is the inner match that searches for the trailing part 54 52 55 45 56 49 53 49 TRUEVISI 4F 4E 2D 58 46 49 4C 45 ON-XFILE 2E 00 .. Is there an easier possibility to search for trailing magic-strings than using a regex? I thought that a simple regex might be to expensive to recognize Image/x-tga, therefore I combined the recognition with the basic tga-recognition of the linux magic file. While testing tika.mimetypes.xml I noticed that I often thought that the matching string already was correct, when the actual recognition was done by the file-extension. Therefore I had to remove the fileextensions of my testfiles to validate the matching parts. To avoid this I suggest to create either a testcase that only takes care of the matches without taking file-extensions into account or to delete the fileextensions of testfiles to validate the matchings. Is there a testcase that does this already? If you have any questions or suggestions I would be glad to hear them. > Typos in tika-mimetypes.xml > --------------------------- > > Key: TIKA-2527 > URL: https://issues.apache.org/jira/browse/TIKA-2527 > Project: Tika > Issue Type: Bug > Components: core > Affects Versions: 2.0, 1.16, 1.17, 1.18 > Environment: ALL > Reporter: Andreas Meier > Priority: Minor > Attachments: > fix-for-TIKA2527-contributed-by-AMeier-Fixed-adpcmmi.patch > > > Are these mimetypes in tika-mimetypes.xml > audio/x-adbcm instead audio/x-adpcm > {code:xml} <mime-type type="audio/x-adbcm">{code} > and > audio/x-dec-adbcm instead audio/x-dec-adpcm > {code:xml} <mime-type type="audio/x-dec-adbcm">{code} > intended? > Couldn't find these mimetypes. > Regards > Andreas -- This message was sent by Atlassian JIRA (v7.6.3#76005)