[
https://issues.apache.org/jira/browse/TIKA-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395013#comment-17395013
]
Chaitra Rajappa commented on TIKA-3516:
---------------------------------------
I have referred the other ticket link since there was not solution to what can
be done in the ticket I wanted to know if you have a plan of action for such
cases.
_You can change the order of charset detectors if that would help you for your
use cases?_
You mentioned the above line, can you tell me how would this help?
> Unexpected charset IBM424_rtl detected for utf_8 file by CharsetDetector
> --------------------------------------------------------------------------
>
> Key: TIKA-3516
> URL: https://issues.apache.org/jira/browse/TIKA-3516
> Project: Tika
> Issue Type: Bug
> Components: detector, parser
> Reporter: Chaitra Rajappa
> Priority: Major
>
> Hi,
> The CharsetDetector detects the wrong charset for a file as IBM424_rtl.
> Resulting in exception
> *_java.nio.charset.UnsupportedCharsetException: IBM424_rtl 17 at
> java.nio.charset.Charset.forName(Charset.java:531)_*
> I see there is also an existing ticket with the same issue thats not been
> fixed.
> https://issues.apache.org/jira/browse/TIKA-2396
> Please suggest the changes to fix this.
> Versions being used:
> apache-core - 1.20
> apache-parsers-1.20
> Thanks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)