Chaitra Rajappa created TIKA-3516:
-------------------------------------
Summary: Unexpected charset IBM424_rtl detected for utf_8 file
by CharsetDetector
Key: TIKA-3516
URL: https://issues.apache.org/jira/browse/TIKA-3516
Project: Tika
Issue Type: Bug
Components: detector, parser
Reporter: Chaitra Rajappa
Hi,
The CharsetDetector detects the wrong charset for a file as IBM424_rtl.
Resulting in exception
*_java.nio.charset.UnsupportedCharsetException: IBM424_rtl 17 at
java.nio.charset.Charset.forName(Charset.java:531)_*
I see there is also an existing ticket with the same issue thats not been fixed.
https://issues.apache.org/jira/browse/TIKA-2396
Please suggest the changes to fix this.
Thanks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)