[
https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026526#comment-14026526
]
Nick Burch commented on TIKA-411:
---------------------------------
I'd suggest just using the Tika App, as the --list-<foo> type methods on that
should provide most of what you need. Or ask the Tika server nicely, it offers
the list as plain text, html or json, the latter should be fairly easy to
process in code!
However, I'm not sure about generating all of the page automatically. The
current formats page has quite a lot of manually written text in it around the
support for each format, and manually groups related formats together along
with links to the relevant parsers
Maybe it would be better to have something which calls the Tika App list
parsers method, then warns you if that parser doesn't get mentioned in the
formats page?
> Generate list of supported and detected types automatically
> -----------------------------------------------------------
>
> Key: TIKA-411
> URL: https://issues.apache.org/jira/browse/TIKA-411
> Project: Tika
> Issue Type: Improvement
> Components: documentation
> Reporter: Jukka Zitting
> Priority: Minor
>
> Currently we edit the list of supported types
> (http://lucene.apache.org/tika/0.7/formats.html) manually, which is bound to
> leave the list outdated and incomplete. It would be better if the list was
> automatically generated from the tika-mimetypes.xml file and the
> getSupportedTypes() response of the AutoDetectParser class.
--
This message was sent by Atlassian JIRA
(v6.2#6252)