https://issues.apache.org/bugzilla/show_bug.cgi?id=52863

--- Comment #5 from Sepp <[email protected]> 2012-04-06 16:23:22 UTC ---
Created attachment 28554
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28554
The same problem with MS PowerPoint files

Hi *,

I have the same problem with tika-app-1.1.jar und MS PowerPoint files. In the
zip archive you can find 2 PPT files. The file Tika.ppt is the "old" file, that
cannot be converted with the error message:

System.ApplicationException : Extraction of text from the file 'Tika.ppt'
failed.
  ----> org.apache.tika.exception.TikaException : Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@2a784f5
  ----> java.lang.ArrayIndexOutOfBoundsException : 
at TikaOnDotNet.TextExtractor.Extract(String filePath) in
d:\Work\tikaondotnet.git\trunk\TikaOnDotnet\TextExtractor.cs:line 63
at TikaOnDotNet.tikadriver_examples.should_extract_from_ppt() in
d:\Work\tikaondotnet.git\trunk\TikaOnDotnet\tikadriver_examples.cs:line 104
--TikaException
at org.apache.tika.parser.CompositeParser.parse(InputStream stream,
ContentHandler handler, Metadata metadata, ParseContext context)
at org.apache.tika.parser.CompositeParser.parse(InputStream stream,
ContentHandler handler, Metadata metadata, ParseContext context)
at org.apache.tika.parser.AutoDetectParser.parse(InputStream stream,
ContentHandler handler, Metadata metadata, ParseContext context)
at TikaOnDotNet.TextExtractor.Extract(String filePath) in
d:\Work\tikaondotnet.git\trunk\TikaOnDotnet\TextExtractor.cs:line 55
--ArrayIndexOutOfBoundsException
at IKVM.Runtime.ByteCodeHelper.arraycopy_primitive_1(Array src, Int32 srcStart,
Array dest, Int32 destStart, Int32 len)
at org.apache.poi.util.LittleEndian.getByteArray(Byte[] data, Int32 offset,
Int32 size)
at org.apache.poi.hpsf.UnicodeString..ctor(Byte[] , Int32 )
at org.apache.poi.hpsf.TypedPropertyValue.readValue(Byte[] , Int32 )
at org.apache.poi.hpsf.Vector.read(Byte[] , Int32 )
at org.apache.poi.hpsf.TypedPropertyValue.readValue(Byte[] , Int32 )
at org.apache.poi.hpsf.VariantSupport.read(Byte[] src, Int32 offset, Int32
length, Int64 type, Int32 codepage)
at org.apache.poi.hpsf.Property..ctor(Int64 id, Byte[] src, Int64 offset, Int32
length, Int32 codepage)
at org.apache.poi.hpsf.Section..ctor(Byte[] src, Int32 offset)
at org.apache.poi.hpsf.PropertySet.init(Byte[] , Int32 , Int32 )
at org.apache.poi.hpsf.PropertySet..ctor(InputStream stream)
at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(DirectoryNode
, String )
at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(DirectoryNode
)
at org.apache.tika.parser.microsoft.OfficeParser.parse(DirectoryNode root,
ParseContext context, Metadata metadata, XHTMLContentHandler xhtml)
at org.apache.tika.parser.microsoft.OfficeParser.parse(InputStream stream,
ContentHandler handler, Metadata metadata, ParseContext context)
at org.apache.tika.parser.CompositeParser.parse(InputStream stream,
ContentHandler handler, Metadata metadata, ParseContext context)

The second file Tika_new.ppt is the same file, that has been saved with the MS
PowerPoint 2010 (File -> Save as...), can be converted without any problems.

With tika-app-0.9.jar the file Tika.ppt can be converted too ==> the error is
in the new version of tika-app-1.1.jar???

Thank you
Sepp

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to