[ https://issues.apache.org/jira/browse/TIKA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603102#comment-15603102 ]
Tim Allison edited comment on TIKA-2142 at 10/25/16 12:43 PM: -------------------------------------------------------------- I'm able to extract 81 images before we hit the AIOOBE with POI. When I open the presentation in PPT, I can only see 78 images, and there appear to be 2 empty slides at the end of the deck. So it looks like PPT is silently ignoring the problem. [~kiwiwings], should we stop early if we calculate an AIOOBE at the POI level, and just log the problem? Or is does this point to a larger problem in our parser? was (Author: talli...@mitre.org): I'm able to extract 81 images before we hit the AIOOBE with POI. When I open the presentation in PPT, I can only see 79 images, and there appear to be 2 empty slides at the end of the deck. So it looks like PPT is silently ignoring the problem. [~kiwiwings], should we stop early if we calculate an AIOOBE at the POI level, and just log the problem? Or is does this point to a larger problem in our parser? > ArrayIndexOutOfBoundsException > ------------------------------ > > Key: TIKA-2142 > URL: https://issues.apache.org/jira/browse/TIKA-2142 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 > Reporter: Seva Alekseyev > Attachments: HPV8dHinge Confocal Results.ppt > > > On the attached PowerPoint presentation, which opens fine with PowerPoint, > the Tika parser throws the following error: > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.readPictures(HSLFSlideShowImpl.java:438) > at > org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.getPictureData(HSLFSlideShowImpl.java:772) > at > org.apache.poi.hslf.usermodel.HSLFSlideShow.getPictureData(HSLFSlideShow.java:547) > at > org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedPictures(HSLFExtractor.java:305) > at > org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:193) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) -- This message was sent by Atlassian JIRA (v6.3.4#6332)