Marc Ubaldino created TIKA-3984:
-----------------------------------

             Summary: Summarize Available Parsers as mapped to file types and 
Maven artifacts
                 Key: TIKA-3984
                 URL: https://issues.apache.org/jira/browse/TIKA-3984
             Project: Tika
          Issue Type: Improvement
          Components: documentation
    Affects Versions: 2.7.0
            Reporter: Marc Ubaldino


Documentation needed:  discrete and clear list of Maven artifacts used to 
configure a given Parser to handle a given file type.

User Question - To manipulate ".odt" file, what Parser do I use and what Maven 
artifact should I choose?   (Pick any file extension or media category).  How 
easy is it for non-Tika users or seasoned users to locate the answer?

Inspiration: [https://maven.apache.org/plugins/index.html]   – Clear, concise.

Tika Resources:
 * Parser listing:  
[https://cwiki.apache.org/confluence/display/TIKA/Parsers]{color:#212121} 
{color}
 * Migration details for old Parsers: 
[https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0]
 * File type listing: 
[https://tika.apache.org/2.7.0/formats.html#Full_list_of_Supported_Formats_in_standard_artifacts]

 

Some sort of table would be great for a lookup.  3-5 columns:
 * Media type
 * File extensions (MIME strings)
 * Parser class
 * Tika Maven coordinates to get Parser class
 * Link in relevant how-to or examples behind Media type and Parser class

thank you,

Marc

// Tika  user since 1.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to