Hi All, As an addendum to my previous e-mail, I got the below exception while using Apache tika 1.6 to parse lots of pdf documents. Like before I get it only for few pdf docs and not for all the docs.
TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@29fe5969 Exception Stack Trace: stdout] (Thread-12 (HornetQ-client-global-threads-248507153)) Exception in updating docbody for report ==> RPT_720610 [Server:research-etl-server] 21:29:23,817 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@29fe5969 [Server:research-etl-server] 21:29:23,818 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:250) [Server:research-etl-server] 21:29:23,818 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) [Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121) [Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:888) [Server:research-etl-server] 21:29:23,820 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:983) [Server:research-etl-server] 21:29:23,821 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:678) [Server:research-etl-server] 21:29:23,821 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70) [Server:research-etl-server] 21:29:23,822 ERROR [stderr] (Thread-12 (HornetQ-client-global-threads-248507153)) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) [Server:research-etl-server] 21:29:23,822 WARN [org.hornetq.cor Thanks, MG From: Mouthgalya Ganapathy Sent: Wednesday, June 03, 2015 9:47 PM To: '[email protected]'; '[email protected]' Subject: Pdf parser - Null pointer exception Hi all, I am trying to use Apache tika 1.8 for extracting contents from pdf. I have the below code for extracting it. It works well for few files. But if I read many files , I see a Null pointer exception in the pdf parser. I think the null pointer exception is because of some memory exception. I do see that you had already fixed this issue in https://issues.apache.org/jira/browse/TIKA-1457. However the error is occurring to me again in Tika 1.8. I tried to revert to Tika 1.6 and still getting the same error. It is mentioned in the jira that 1 -2 % of the corpus can encounter that issue. Looks like that issue is still there Any suggestions? Tika version: <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-server</artifactId> <version>1.8</version> </dependency> I am running it as a part of J2EE APP in JBoss 1.7 Code:- //Parse the pdf content using Apache Tikka InputStream is = null; try { is = new BufferedInputStream(new FileInputStream(input)); //Disable write limit. contenthandler = new BodyContentHandler(-1); metadata = new Metadata(); pdfparser = new PDFParser(); context = new ParseContext(); pdfparser.parse(is, contenthandler, metadata, context); docBody=contenthandler.toString(); //System.out.println(contenthandler.toString()); } catch (Exception e) { System.out.println("Exception in updating docbody for report ==> " + report.getDocID()); if(is==null) System.out.println("The input stream is a null object"); e.printStackTrace(); logger.log(Level.SEVERE, e.getMessage(), e); } finally { if (is != null) is.close(); contenthandler=null; metadata=null; pdfparser=null; context =null; } Exception:- I am just including the null pointer exception in the parser below. 10:53:11,696 INFO [stdout] (Thread-11 (HornetQ-client-global-threads-1619682129)) Exception in updating docbody for report ==> RPT_764268 10:53:12,218 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException 10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158) 10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:881) 10:53:12,219 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:965) 10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:676) 10:53:12,220 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70) 10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) 10:53:12,221 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at java.lang.reflect.Method.invoke(Method.java:597) 10:53:12,222 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) 10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) 10:53:12,223 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36) 10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,224 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.jpa.interceptor.SBInvocationInterceptor.processInvocation(SBInvocationInterceptor.java:47) 10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,225 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) 10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,226 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) 10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) 10:53:12,227 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51) 10:53:12,228 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202) 10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306) 10:53:12,229 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190) 10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,230 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) 10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) 10:53:12,231 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) 10:53:12,232 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32) 10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,233 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) 10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,234 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) 10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) 10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173) 10:53:12,235 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) 10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) 10:53:12,236 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at com.fitch.research.ejb.ResearchReportManagerBeanLocal$$$view4.processResearchReport(Unknown Source) 10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at com.fitch.research.ejb.mdb.ResearchQueueManagerMDB.onMessage(ResearchQueueManagerMDB.java:150) 10:53:12,868 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) 10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 10:53:12,869 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at java.lang.reflect.Method.invoke(Method.java:597) 10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) 10:53:12,870 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) 10:53:12,871 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36) 10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) 10:53:12,872 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) 10:53:12,873 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) 10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51) 10:53:12,874 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202) 10:53:12,875 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306) 10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190) 10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,876 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) 10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,877 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) 10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) 10:53:12,878 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:43) 10:53:12,879 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponentDescription$5$1.processInvocation(MessageDrivenComponentDescription.java:184) 10:53:12,880 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) 10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) 10:53:12,881 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) 10:53:12,882 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) 10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173) 10:53:12,883 ERROR [stderr] (Thread-11 (HornetQ-client-global-threads-1619682129)) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) Thanks, MG Product Development Team ______________________________________________________________________ Confidentiality Notice: The information contained in this e-mail and any attachment(s) is confidential and for the use of the addressee(s) only. If you are not the intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete this e-mail and any attachment(s) and notify us immediately. Unauthorized use, reliance, disclosure or copying of the contents of this e-mail and any attachment(s), or any similar action, is strictly prohibited. Fitch Ratings reserves the right, to the extent permitted by applicable law, to retain, monitor and intercept e-mail messages both to and from its systems. This e-mail has been scanned by the MessageLabs Email Security System. For more information, please visit http://www.messagelabs.com/email. ______________________________________________________________________
