Hi all,
I am trying to use Apache tika 1.8 for extracting contents from pdf. I have the
below code for extracting it. It works well for few files. But if I read many
files , I see a Null pointer exception in the pdf parser. I think the null
pointer exception is because of some memory exception. I do see that you had
already fixed this issue in https://issues.apache.org/jira/browse/TIKA-1457.
However the error is occurring to me again in Tika 1.8. I tried to revert to
Tika 1.6 and still getting the same error.
It is mentioned in the jira that 1 -2 % of the corpus can encounter that issue.
Looks like that issue is still there
Any suggestions?
Tika version:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-server</artifactId>
<version>1.8</version>
</dependency>
I am running it as a part of J2EE APP in JBoss 1.7
Code:-
//Parse the pdf content using Apache Tikka
InputStream is = null;
try {
is = new BufferedInputStream(new FileInputStream(input));
//Disable write limit.
contenthandler = new BodyContentHandler(-1);
metadata = new Metadata();
pdfparser = new PDFParser();
context = new ParseContext();
pdfparser.parse(is, contenthandler, metadata, context);
docBody=contenthandler.toString();
//System.out.println(contenthandler.toString());
}
catch (Exception e) {
System.out.println("Exception in updating docbody for report ==>
" + report.getDocID());
if(is==null)
System.out.println("The input stream is a null object");
e.printStackTrace();
logger.log(Level.SEVERE, e.getMessage(), e);
}
finally {
if (is != null) is.close();
contenthandler=null;
metadata=null;
pdfparser=null;
context =null;
}
Exception:-
I am just including the null pointer exception in the parser below.
10:53:11,696 INFO [stdout] (Thread-11
(HornetQ-client-global-threads-1619682129)) Exception in updating docbody for
report ==> RPT_764268
10:53:12,218 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
10:53:12,219 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:881)
10:53:12,219 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:965)
10:53:12,220 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:676)
10:53:12,220 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70)
10:53:12,221 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
10:53:12,221 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,222 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,222 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,223 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,223 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,223 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,224 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,224 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.jpa.interceptor.SBInvocationInterceptor.processInvocation(SBInvocationInterceptor.java:47)
10:53:12,225 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,225 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,226 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,226 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,227 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,227 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,228 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,228 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,229 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,229 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,229 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,230 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,230 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,231 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,231 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,231 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,232 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,232 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32)
10:53:12,233 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,234 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,234 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,235 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,235 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,235 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,236 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,236 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72)
10:53:12,236 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
com.fitch.research.ejb.ResearchReportManagerBeanLocal$$$view4.processResearchReport(Unknown
Source)
10:53:12,868 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
com.fitch.research.ejb.mdb.ResearchQueueManagerMDB.onMessage(ResearchQueueManagerMDB.java:150)
10:53:12,868 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
10:53:12,869 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,869 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,870 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,870 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,871 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,871 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,872 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,872 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,872 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,873 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,873 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,874 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,874 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,874 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,875 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,875 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,876 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,876 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,876 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,877 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,877 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,878 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,878 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,878 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,879 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:43)
10:53:12,879 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,880 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponentDescription$5$1.processInvocation(MessageDrivenComponentDescription.java:184)
10:53:12,880 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,881 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,882 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,883 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,883 ERROR [stderr] (Thread-11
(HornetQ-client-global-threads-1619682129)) at
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
Thanks,
MG
Product Development Team
______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any
attachment(s) is confidential and for the use of the addressee(s) only. If you
are not the intended recipient of this e-mail, do not duplicate or redistribute
it by any means. Please delete this e-mail and any attachment(s) and notify us
immediately. Unauthorized use, reliance, disclosure or copying of the contents
of this e-mail and any attachment(s), or any similar action, is strictly
prohibited. Fitch Ratings reserves the right, to the extent permitted by
applicable law, to retain, monitor and intercept e-mail messages both to and
from its systems.
This e-mail has been scanned by the MessageLabs Email Security System. For
more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________