Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@1a8402c
--------------------------------------------------------------------------------------
Key: TIKA-685
URL: https://issues.apache.org/jira/browse/TIKA-685
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.9
Environment: MS Windows XP Professional Version 2002 Service Pack 3
Reporter: Jaroslaw Krzeminski
Runtime error while parsing MS Word document with Apache Tika GUI App or from a
program snippet like:
InputStream inputStream = new FileInputStream(docFile);
ContentHandler contentHandler = new BodyContentHandler(new BufferedWriter(new
FileWriter(textFile)));
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
parser.parse(inputStream, contentHandler, metadata);
Error from Tika App Errors panel:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@1a8402c
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.gui.TikaGUI.importStream(TikaGUI.java:186)
at
org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:99)
at javax.swing.TransferHandler.importData(Unknown Source)
at javax.swing.TransferHandler$DropHandler.drop(Unknown Source)
at java.awt.dnd.DropTarget.drop(Unknown Source)
at javax.swing.TransferHandler$SwingDropTarget.drop(Unknown Source)
at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(Unknown
Source)
at
sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(Unknown
Source)
at
sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(Unknown
Source)
at sun.awt.dnd.SunDropTargetEvent.dispatch(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processDropTargetEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$000(Unknown Source)
at java.awt.EventQueue$1.run(Unknown Source)
at java.awt.EventQueue$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(Unknown
Source)
at java.security.AccessControlContext$1.doIntersectionPrivilege(Unknown
Source)
at java.awt.EventQueue$2.run(Unknown Source)
at java.awt.EventQueue$2.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(Unknown
Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
Caused by: java.lang.NullPointerException
at
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(CharacterSprmUncompressor.java:39)
at org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:61)
at
org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:98)
at org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:797)
at
org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:191)
at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:430)
at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:420)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:75)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:182)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 39 more
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira