[
https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729573#comment-14729573
]
Nick Burch commented on TIKA-1657:
----------------------------------
Let's consider this config file:
{{{
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<mime-exclude>image/jpeg</mime-exclude>
<mime-exclude>application/pdf</mime-exclude>
<parser-exclude
class="org.apache.tika.parser.executable.ExecutableParser"/>
<parser-exclu
class="org.apache.tika.parser.executable.ExecutableParser2"/>
</parser>
<parser class="org.apache.tika.parser.EmptyParser">
<mime>application/pdf</mime>
<no-mime>hello/world</no-mime>
</parser>
</parsers>
</properties>
}}}
With {{--dump-active-config}} you'd get what Tika was using of that, allowing
you to spot what was and wasn't used, eg
{{{
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<mime-exclude>image/jpeg</mime-exclude>
<mime-exclude>application/pdf</mime-exclude>
<parser-exclude
class="org.apache.tika.parser.executable.ExecutableParser"/>
</parser>
<parser class="org.apache.tika.parser.EmptyParser">
<mime>application/pdf</mime>
</parser>
</parsers>
</properties>
}}}
Or, with {{--dump-static-config}} you'd get something like:
{{{
<properties>
<service-loader dynamic="false" />
<translators/>
<detectors>
<detector class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/>
<detector class="org.apache.tika.parser.pkg.ZipContainerDetector"/>
<detector class="org.gagravarr.tika.OggDetector"/>
<detector class="org.apache.tika.mime.MimeTypes"/>
</detectors>
<parsers>
<parser class="org.apache.tika.parser.CompositeParser">
<mime-exclude>image/jpeg</mime-exclude>
<mime-exclude>application/pdf</mime-exclude>
<parser class="org.apache.tika.parser.asm.ClassParser"/>
<parser class="org.apache.tika.parser.audio.AudioParser"/>
<parser class="org.apache.tika.parser.audio.MidiParser"/>
<parser class="org.apache.tika.parser.chm.ChmParser"/>
<parser class="org.apache.tika.parser.code.SourceCodeParser"/>
... everything except executable ...
</parser>
<parser class="org.apache.tika.parser.EmptyParser">
<mime>application/pdf</mime>
</parser>
</parsers>
</properties>
}}}
> Allow easier XML serialization of TikaConfig
> --------------------------------------------
>
> Key: TIKA-1657
> URL: https://issues.apache.org/jira/browse/TIKA-1657
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Fix For: 1.11
>
> Attachments: TIKA-1558-blacklist-effective.xml
>
>
> In TIKA-1418, we added an example for how to dump the config file so that
> users could easily modify it. I think we should go further and make this an
> option at the tika-core level with hooks for tika-app and tika-server. I
> propose adding a main() to TikaConfig that will print the xml config file
> that Tika is currently using to stdout.
> I'd like to put this into core so that e.g. Solr's DIH users can get by
> without having to download tika-app separately.
> There's every chance that I've not accounted for issues with dynamic loading
> etc. Also, I'd be ok with only having this available in tika-app and
> tika-server if there are good reasons.
> Feedback?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)