[
https://issues.apache.org/jira/browse/RAT-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788111#comment-17788111
]
Claude Warren commented on RAT-150:
-----------------------------------
What we need to be able to do is to identify file type and extract comments.
Tika should be able to help us identify file types but I don't know how much it
will help with comment extraction. There needs to be some exploration here
around what Tika can do in terms of extracting comments from various file types
and what it can do with identifying binary file types so we can properly report
them.
> RAT should use Apache Tika to simply guess ignored [application/X] file types
> and focus on the [text/Y] family as a sensible default
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: RAT-150
> URL: https://issues.apache.org/jira/browse/RAT-150
> Project: Apache Rat
> Issue Type: New Feature
> Components: mime-meta-data, scan
> Affects Versions: 0.8
> Reporter: Chris A. Mattmann
> Priority: Major
>
> RAT could use Apache Tika to automatically guess file types, obviating the
> need to specify an explicit white list or black list.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)