[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825729#comment-15825729 ]
Nick Burch commented on TIKA-2241: ---------------------------------- You only need to specify a mimetype for a parser if you want to bind it to a type it wouldn't normally handle, or to specify an exclude to have it ignore a type it otherwise would do If you create a custom config file which does bind an extra mime type to one file, then dump that, you should still see it listed just for that parser > DumpTikaConfigExample generates strange tika-config.xml > -------------------------------------------------------- > > Key: TIKA-2241 > URL: https://issues.apache.org/jira/browse/TIKA-2241 > Project: Tika > Issue Type: Bug > Components: example > Affects Versions: 1.14 > Environment: Apache Maven 3.2.5 > Java version: 1.8.0_112, vendor: Oracle Corporation > Archlinux: > OS name: "linux", version: "4.8.11-1-arch", arch: "amd64", family: "unix" > Reporter: Andreas Baumann > > {code:none|borderStyle=solid} > mvn exec:java > -Dexec.mainClass="org.apache.tika.example.DumpTikaConfigExample" > -Dexec.arguments="--dump-static" > {code} > Tika 1.8 used to produce something like: > {code:xml|borderStyle=solid} > <?xml version="1.0" encoding="UTF-8" standalone="no"?> > <properties> > <!--for example: <mimeTypeRepository > resource="/org/apache/tika/mime/tika-mimetypes.xml"/>--> > <!--for example: <translator > class="org.apache.tika.language.translate.GoogleTranslator"/>--> > <detectors> > <detector class="org.gagravarr.tika.OggDetector"/> > <detector > class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/> > <detector class="org.apache.tika.parser.pkg.ZipContainerDetector"/> > <detector class="org.apache.tika.mime.MimeTypes"/> > </detectors> > <parsers> > <parser class="org.apache.tika.parser.asm.ClassParser"> > <mime>application/java-vm</mime> > </parser> > <parser class="org.apache.tika.parser.audio.AudioParser"> > <mime>audio/basic</mime> > <mime>audio/x-aiff</mime> > <mime>audio/x-wav</mime> > ... > {code} > With Tika 1.14 I get: > {code:xml|borderStyle=solid} > <?xml version="1.0" encoding="UTF-8" standalone="no"?> > <properties> > <!--for example: <mimeTypeRepository > resource="/org/apache/tika/mime/tika-mimetypes.xml"/>--> > <service-loader dynamic="true" loadErrorHandler="IGNORE"/> > <!--No translators available--> > <detectors> > <detector > class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/> > <detector > class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/> > <detector class="org.apache.tika.parser.pkg.ZipContainerDetector"/> > <detector class="org.apache.tika.parser.pkg.ZipContainerDetector"/> > <detector class="org.gagravarr.tika.OggDetector"/> > <detector class="org.gagravarr.tika.OggDetector"/> > <detector class="org.apache.tika.mime.MimeTypes"/> > </detectors> > <parsers> > <parser class="org.apache.tika.parser.apple.AppleSingleFileParser"/> > <parser class="org.apache.tika.parser.apple.AppleSingleFileParser"/> > <parser class="org.apache.tika.parser.asm.ClassParser"/> > <parser class="org.apache.tika.parser.asm.ClassParser"/> > <parser class="org.apache.tika.parser.audio.AudioParser"/> > <parser class="org.apache.tika.parser.audio.AudioParser"/> > ... > {code} > The following problems IMHO: > - Why are all classes in double? > - As I understood the order of the parsers matters: earliler ones > take precendence over later ones? > - Matching MIME-types are missing completely? -- This message was sent by Atlassian JIRA (v6.3.4#6332)