https://issues.apache.org/bugzilla/show_bug.cgi?id=55732

--- Comment #4 from Marcel Pokrandt <[email protected]> ---
I can confirm this bug with my own old ´97 PPT which contains nothing more than
an empty Text-Area.

Caused by: java.lang.ArrayIndexOutOfBoundsException: 20
    at org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:161)
    at
org.apache.poi.hslf.record.StyleTextProp9Atom.<init>(StyleTextProp9Atom.java:70)
    ... 65 more


I made a small test-case (attached) and a suggested solution (attached too) as
a patch of class org.apache.poi.hslf.record.StyleTextProp9Atom. Before reading
the (not used) fields textCfException9 and textSiException I check if the
offset is already behind the array size. 

if (i >= data.length) {
      break;
}

Since both fields are NOT used anywhere I think it should be safe to skip
reading them in this case. With my patch two of my checked files with same
error succeed to parse and I could extract text.


I would really appreciate if you could integrate this patch because I´m using
poi/tika for indexing a great bunch of office files and a lot of them seem to
fail because of the same error.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to