https://issues.apache.org/bugzilla/show_bug.cgi?id=51946

             Bug #: 51946
           Summary: [BUG] TextPieceTable <init>
                    ArrayIndexOutOfBoundsException and
                    IllegalStateException - Hong Kong encoding?
           Product: POI
           Version: 3.8-dev
          Platform: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HWPF
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


Unable to include sample document due to sensitive nature.

If there any pointers for utilities that can further investigate the documents,
let me know and I'll see what further information I can supply.

A few of my documents are trying to perform an arraycopy with a length thats
greater than the amount remaining in the stream buffer.  File opens
successfully in Word 2010, and may be older than a Word97 document.  Documents
likely has encoding from Hong Kong region.


A couple produce the following Stack Trace (Daily Build)
Caused by: java.lang.ArrayIndexOutOfBoundsException
    at java.lang.System.arraycopy(Native Method)
    at org.apache.poi.hwpf.model.TextPieceTable.<init>(TextPieceTable.java:108)
    at
org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:70)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:71)
    at
org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:410)
    at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:69)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)



More than a handful are caught earlier on and produce this stack trace:
Caused by: java.lang.IllegalStateException: Told we're for characters 0 ->
6385, but actually covers 6373 characters!
    at org.apache.poi.hwpf.model.TextPiece.<init>(TextPiece.java:73)
    at org.apache.poi.hwpf.model.TextPieceTable.<init>(TextPieceTable.java:115)
    at
org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:70)
    at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:71)
    at
org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:410)
    at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:69)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to