[
https://issues.apache.org/jira/browse/TIKA-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927111#comment-17927111
]
Subbu edited comment on TIKA-4370 at 2/14/25 12:15 PM:
-------------------------------------------------------
[~tallison] : Thanks for the reply. What I am trying to understand instead of
applications trying to retry detection with file name hint, what do you think
is a good change in TextDetector so that it can work out of the box in
tika-core? If TXTParser can identify the incoming byte stream is Shift_JIS,
isn't it the same thing that we can use & conclude here to declare that the
file is textual here? Or there are differences you see?
was (Author: JIRAUSER307746):
[~tallison] : Thanks for the reply. What I am trying to understand instead of
applications trying to retry detection with file name hint, what do you think
is a good change in TextDetector so that it can work out of the box in
tika-core? If TXTParser can identify the incoming byte stream is Shift_JIS,
isn't it the same thing that we can conclude here to declare that the file is
textual here?
> SJIS Encoded Files Can't be Detected
> ------------------------------------
>
> Key: TIKA-4370
> URL: https://issues.apache.org/jira/browse/TIKA-4370
> Project: Tika
> Issue Type: Bug
> Reporter: Subbu
> Priority: Major
>
> When character encoding of file is SJIS, without file name in the metadata,
> most files content-type detected as application/octet-stream. Is there zero
> support for SJIS?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)