https://bz.apache.org/bugzilla/show_bug.cgi?id=63323
Bug ID: 63323 Summary: HwmfText's getText can throw StringIndexOutOfRange on shiftjis encoded text Product: POI Version: 4.0.x-dev Hardware: PC Status: NEW Severity: normal Priority: P2 Component: POI Overall Assignee: dev@poi.apache.org Reporter: talli...@apache.org Target Milestone: --- When upgrading Tika to POI 4.1.0-rc3, one of our unit tests that tests for correct encoding handling is now failing. Multibyte character encodings need to be handled more carefully than relying on stringLength in the call to substring: public String getText(Charset charset) throws IOException { return (new String(this.rawTextBytes, charset)).substring(0, this.stringLength); } The triggering test file is here: https://github.com/apache/tika/blob/master/tika-parsers/src/test/resources/test-documents/testWMF_charset.wmf -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org