[ 
https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347491#comment-17347491
 ] 

Caleb Cushing edited comment on TIKA-3409 at 5/19/21, 10:32 AM:
----------------------------------------------------------------

the latter, but if you see the SO I referenced there are some types that are 
... text editor readable, that start with application/ instead of text/ I'm not 
sure whether detect will return application or text for XML or json, or any 
other number that I'm sure use application, but that's where the problem lies. 
even if tika is consistent (dubious) and all things that are textual will 
return text on detect, that seems like an implementation detail of tika. Also 
matching type "text" to ask the question feels a little implementation 
orientated rather than domain oriented.


was (Author: xenoterracide):
the latter, but if you see the SO I referenced there are some types that are 
... text editor readable, that start with application/ instead of text/ I'm not 
sure whether detect will return application or text for XML or json, or any 
other number that I'm sure use application, but that's where the problem lies. 
even if tika is consistent and all things that are textual will return text on 
detect, that seems like an implementation detail of tika. Also matching type 
"text" to ask the question feels a little implementation orientated rather than 
domain oriented.

> provide isBinary/isText method
> ------------------------------
>
>                 Key: TIKA-3409
>                 URL: https://issues.apache.org/jira/browse/TIKA-3409
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Caleb Cushing
>            Priority: Major
>
> Since tika can detect what kind of file something is, it could also know 
> whether that file type is binary or not, I'd love to have a method  
> `MimeType::isBinary` or something, so I could know if I could try "parsing" 
> the file.
> related https://stackoverflow.com/q/620993/206466



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to