[
https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723941#comment-14723941
]
Tim Allison commented on TIKA-1657:
-----------------------------------
I looked at this a bit today, I'm now backing off to putting this only in
tika-app with the "-c" option printing to STDOUT.
In order to maintain round-trip-ability (xml -> TikaConfig -> xml), we'll need
to store a few more things, which makes things a bit ugly...we may need to
store the original "include" mime-types/parsers as well as the "exclude"
mime-types/parsers...I think:
#. {{getMimeRegistryResource()}} in TikaConfig (String, trivial)
#. {{getExcludedTypes()}} in ParserDecorator (fairly trivial)
#. {{getOriginalIncludedTypes()}} in ParserDecorator (trivial, but ugly)
#. {{getExcludedParsers()}} in CompositeParser (fairly trivial)
#. {{getOriginalIncludedParsers()}} in CompositeParser (trivial, but ugly)
Does this look ok? Any other recommendations? Is there a more elegant way to
represent a ParserDecorator in xml?
Plan B: store only the excluded and assume that they were included in the
"included"...
There may be more items that arise as I progress on this, of course.
I'd like to get this issue out of the way before working on TIKA-1508.
> Allow easier dumping of TikaConfig file from tika-core
> ------------------------------------------------------
>
> Key: TIKA-1657
> URL: https://issues.apache.org/jira/browse/TIKA-1657
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Fix For: 1.11
>
>
> In TIKA-1418, we added an example for how to dump the config file so that
> users could easily modify it. I think we should go further and make this an
> option at the tika-core level with hooks for tika-app and tika-server. I
> propose adding a main() to TikaConfig that will print the xml config file
> that Tika is currently using to stdout.
> I'd like to put this into core so that e.g. Solr's DIH users can get by
> without having to download tika-app separately.
> There's every chance that I've not accounted for issues with dynamic loading
> etc. Also, I'd be ok with only having this available in tika-app and
> tika-server if there are good reasons.
> Feedback?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)