[
https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347418#comment-17347418
]
Nick Burch edited comment on TIKA-3409 at 5/19/21, 8:47 AM:
------------------------------------------------------------
Do you want to know if Apache Tika can parse the file? Or if you could make
sense of it by eg opening in a text editor?
For the former, you can ask Tika what mime types it has parsers for, see
[https://cwiki.apache.org/confluence/display/TIKA/Troubleshooting+Tika#TroubleshootingTika-IdentifyingwhatParsersyourTikainstallsupports]
For the latter, you can get most of that by asking Tika to detect the type, ask
Tika for the aliases for that type, and any parent types. Finally, see if the
primary type or any aliases or any parents start with {{text/}} - that should
get pretty much anything that's actually text-based
was (Author: gagravarr):
Do you want to know if Apache Tika can parse the file? Or if you could make
sense of it by eg opening in a text editor?
For the former, you can ask Tika what mime types it has parsers for, see
[https://cwiki.apache.org/confluence/display/TIKA/Troubleshooting+Tika#TroubleshootingTika-IdentifyingwhatParsersyourTikainstallsupports]
For the latter, you can get most of that by asking Tika to detect the type, ask
Tika for the aliases for that type, and see if the primary type or any aliases
start with `text/` - that should get pretty much anything that's actually
text-based
> provide isBinary/isText method
> ------------------------------
>
> Key: TIKA-3409
> URL: https://issues.apache.org/jira/browse/TIKA-3409
> Project: Tika
> Issue Type: New Feature
> Reporter: Caleb Cushing
> Priority: Major
>
> Since tika can detect what kind of file something is, it could also know
> whether that file type is binary or not, I'd love to have a method
> `MimeType::isBinary` or something, so I could know if I could try "parsing"
> the file.
> related https://stackoverflow.com/q/620993/206466
--
This message was sent by Atlassian Jira
(v8.3.4#803005)