[
https://issues.apache.org/jira/browse/TIKA-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723160#comment-17723160
]
Sandeep Kulkarni commented on TIKA-3984:
----------------------------------------
I would like to upvote for need of this information. This is the same request
that Neha Kamat who is my colleague requested over user mailing list at
[[email protected]|https://lists.apache.org/[email protected]].
Based on supported extension list, we plan to implement filters in our
application so that right set of extensions (supported) are sent to TIKA for
extraction and non-supported extensions are not even sent to TIKA for
processing.
> Summarize Available Parsers as mapped to file types and Maven artifacts
> -----------------------------------------------------------------------
>
> Key: TIKA-3984
> URL: https://issues.apache.org/jira/browse/TIKA-3984
> Project: Tika
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 2.7.0
> Reporter: Marc Ubaldino
> Priority: Major
>
> Documentation needed: discrete and clear list of Maven artifacts used to
> configure a given Parser to handle a given file type.
> User Question - To manipulate ".odt" file, what Parser do I use and what
> Maven artifact should I choose? (Pick any file extension or media
> category). How easy is it for non-Tika users or seasoned users to locate the
> answer?
> Inspiration: [https://maven.apache.org/plugins/index.html] – Clear, concise.
> Tika Resources:
> * Parser listing:
> [https://cwiki.apache.org/confluence/display/TIKA/Parsers]{color:#212121}
> {color}
> * Migration details for old Parsers:
> [https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0]
> * File type listing:
> [https://tika.apache.org/2.7.0/formats.html#Full_list_of_Supported_Formats_in_standard_artifacts]
>
> Some sort of table would be great for a lookup. 3-5 columns:
> * Media type
> * File extensions (MIME strings)
> * Parser class
> * Tika Maven coordinates to get Parser class
> * Link in relevant how-to or examples behind Media type and Parser class
> thank you,
> Marc
> // Tika user since 1.2
--
This message was sent by Atlassian Jira
(v8.20.10#820010)