rdamir created TIKA-2737:
----------------------------

             Summary: regression in charset detection
                 Key: TIKA-2737
                 URL: https://issues.apache.org/jira/browse/TIKA-2737
             Project: Tika
          Issue Type: Bug
          Components: detector
    Affects Versions: 1.19, 1.18, 1.17
            Reporter: rdamir
         Attachments: CharsetDetectorTest.java, cbp12pr_ia_st.txt, charset- 
match-tike1.16.png, charset- match-tike1.17.png

The attached text file is a test csv file (cbp12pr_ia_st.txt) I'm using for 
testing of csv parser. from version 1.13 to 1.16 - the test was working. I'm 
trying to upgrade to the latest version 1.19. The test started failing with 
version 1.17 (see attachments for matches in version 1.16 as well as 1.17). The 
attached test file contain method testFailure (the last one) that show the 
wrong detection the expected is UTF-8 detected IBM500.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to