https://bz.apache.org/bugzilla/show_bug.cgi?id=63323

            Bug ID: 63323
           Summary: HwmfText's getText can throw StringIndexOutOfRange on
                    shiftjis encoded text
           Product: POI
           Version: 4.0.x-dev
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: POI Overall
          Assignee: dev@poi.apache.org
          Reporter: talli...@apache.org
  Target Milestone: ---

When upgrading Tika to POI 4.1.0-rc3, one of our unit tests that tests for
correct encoding handling is now failing.  Multibyte character encodings need
to be handled more carefully than relying on stringLength in the call to
substring:


 public String getText(Charset charset) throws IOException {
            return (new String(this.rawTextBytes, charset)).substring(0,
this.stringLength);
        }

The triggering test file is here:
https://github.com/apache/tika/blob/master/tika-parsers/src/test/resources/test-documents/testWMF_charset.wmf

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to