https://issues.apache.org/bugzilla/show_bug.cgi?id=52863
--- Comment #5 from Sepp <[email protected]> 2012-04-06 16:23:22 UTC --- Created attachment 28554 --> https://issues.apache.org/bugzilla/attachment.cgi?id=28554 The same problem with MS PowerPoint files Hi *, I have the same problem with tika-app-1.1.jar und MS PowerPoint files. In the zip archive you can find 2 PPT files. The file Tika.ppt is the "old" file, that cannot be converted with the error message: System.ApplicationException : Extraction of text from the file 'Tika.ppt' failed. ----> org.apache.tika.exception.TikaException : Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2a784f5 ----> java.lang.ArrayIndexOutOfBoundsException : at TikaOnDotNet.TextExtractor.Extract(String filePath) in d:\Work\tikaondotnet.git\trunk\TikaOnDotnet\TextExtractor.cs:line 63 at TikaOnDotNet.tikadriver_examples.should_extract_from_ppt() in d:\Work\tikaondotnet.git\trunk\TikaOnDotnet\tikadriver_examples.cs:line 104 --TikaException at org.apache.tika.parser.CompositeParser.parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) at org.apache.tika.parser.CompositeParser.parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) at org.apache.tika.parser.AutoDetectParser.parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) at TikaOnDotNet.TextExtractor.Extract(String filePath) in d:\Work\tikaondotnet.git\trunk\TikaOnDotnet\TextExtractor.cs:line 55 --ArrayIndexOutOfBoundsException at IKVM.Runtime.ByteCodeHelper.arraycopy_primitive_1(Array src, Int32 srcStart, Array dest, Int32 destStart, Int32 len) at org.apache.poi.util.LittleEndian.getByteArray(Byte[] data, Int32 offset, Int32 size) at org.apache.poi.hpsf.UnicodeString..ctor(Byte[] , Int32 ) at org.apache.poi.hpsf.TypedPropertyValue.readValue(Byte[] , Int32 ) at org.apache.poi.hpsf.Vector.read(Byte[] , Int32 ) at org.apache.poi.hpsf.TypedPropertyValue.readValue(Byte[] , Int32 ) at org.apache.poi.hpsf.VariantSupport.read(Byte[] src, Int32 offset, Int32 length, Int64 type, Int32 codepage) at org.apache.poi.hpsf.Property..ctor(Int64 id, Byte[] src, Int64 offset, Int32 length, Int32 codepage) at org.apache.poi.hpsf.Section..ctor(Byte[] src, Int32 offset) at org.apache.poi.hpsf.PropertySet.init(Byte[] , Int32 , Int32 ) at org.apache.poi.hpsf.PropertySet..ctor(InputStream stream) at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(DirectoryNode , String ) at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(DirectoryNode ) at org.apache.tika.parser.microsoft.OfficeParser.parse(DirectoryNode root, ParseContext context, Metadata metadata, XHTMLContentHandler xhtml) at org.apache.tika.parser.microsoft.OfficeParser.parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) at org.apache.tika.parser.CompositeParser.parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) The second file Tika_new.ppt is the same file, that has been saved with the MS PowerPoint 2010 (File -> Save as...), can be converted without any problems. With tika-app-0.9.jar the file Tika.ppt can be converted too ==> the error is in the new version of tika-app-1.1.jar??? Thank you Sepp -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
