rdamir created TIKA-2737:
----------------------------
Summary: regression in charset detection
Key: TIKA-2737
URL: https://issues.apache.org/jira/browse/TIKA-2737
Project: Tika
Issue Type: Bug
Components: detector
Affects Versions: 1.19, 1.18, 1.17
Reporter: rdamir
Attachments: CharsetDetectorTest.java, cbp12pr_ia_st.txt, charset-
match-tike1.16.png, charset- match-tike1.17.png
The attached text file is a test csv file (cbp12pr_ia_st.txt) I'm using for
testing of csv parser. from version 1.13 to 1.16 - the test was working. I'm
trying to upgrade to the latest version 1.19. The test started failing with
version 1.17 (see attachments for matches in version 1.16 as well as 1.17). The
attached test file contain method testFailure (the last one) that show the
wrong detection the expected is UTF-8 detected IBM500.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)