Thanks Nick. Just a copy and paste error in the email. I was able to figure out how to bypass the JornalParser and just use PDF ones. --Pei
On Wed, 24 Feb 2016, Pei Chen wrote: > Does the default pdf parser using auto detect parser require to tika > to run in server mode? No > It seems to try and open an http connection to localhost:8080 by > default? Can it run in-process? The stacktrace shows you're not using the PDF parser: > at > org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:74) > at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60) See https://wiki.apache.org/tika/GrobidJournalParser for how to configure the grobid parser if you want to use it Nick On Wed, Feb 24, 2016 at 5:15 PM, Pei Chen <[email protected]> wrote: > Hi tika-dev, > Does the default pdf parser using auto detect parser require to tika > to run in server mode? It seems to try and open an http connection to > localhost:8080 by default? Can it run in-process? > > > ...<snip> > FileInputStream stream = new > FileInputStream("src/test/resources/somepdf.pdf"); > //works fine in-process with other doc types. > Tika tika = new Tika(); > tika.parseToString(stream); > ...<snip> > > > 24 Feb 2016 17:06:24 WARN PhaseInterceptorChain - Interceptor for > {http://localhost:8080/processHeaderDocument}WebClient has thrown > exception, unwinding now > > org.apache.cxf.interceptor.Fault: No message body writer has been > found for class org.apache.cxf.jaxrs.ext.multipart.MultipartBody, > ContentType: multipart/form-data > > at > org.apache.cxf.jaxrs.client.WebClient$BodyWriter.doWriteBody(WebClient.java:1220) > > at > org.apache.cxf.jaxrs.client.AbstractClient$AbstractBodyWriter.handleMessage(AbstractClient.java:1044) > > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > > at > org.apache.cxf.jaxrs.client.AbstractClient.doRunInterceptorChain(AbstractClient.java:623) > > at > org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1084) > > at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:883) > > at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:854) > > at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:320) > > at org.apache.cxf.jaxrs.client.WebClient.post(WebClient.java:329) > > at > org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:74) > > at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60) > > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > > at org.apache.tika.Tika.parseToString(Tika.java:496) > > at org.apache.tika.Tika.parseToString(Tika.java:571)
