[
https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347491#comment-17347491
]
Caleb Cushing commented on TIKA-3409:
-------------------------------------
the latter, but if you see the SO I referenced there are some types that are
... text editor readable, that start with application/ instead of text/ I'm not
sure whether detect will return application or text for XML or json, or any
other number that I'm sure use application, but that's where the problem lies.
even if tika is consistent and all things that are textual will return text on
detect, that seems like an implementation detail of tika. Also matching type
"text" to ask the question feels a little implementation orientated rather than
domain oriented.
> provide isBinary/isText method
> ------------------------------
>
> Key: TIKA-3409
> URL: https://issues.apache.org/jira/browse/TIKA-3409
> Project: Tika
> Issue Type: New Feature
> Reporter: Caleb Cushing
> Priority: Major
>
> Since tika can detect what kind of file something is, it could also know
> whether that file type is binary or not, I'd love to have a method
> `MimeType::isBinary` or something, so I could know if I could try "parsing"
> the file.
> related https://stackoverflow.com/q/620993/206466
--
This message was sent by Atlassian Jira
(v8.3.4#803005)