Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "GrobidJournalParser" page has been changed by ChrisMattmann:
https://wiki.apache.org/tika/GrobidJournalParser?action=diff&rev1=2&rev2=3

Comment:
- update to deal with the new REST Grobid

  
  Check the `out` directory, you should see `*.tei.xml` files in there.
  
+ === Start the GROBID Service ===
+ 
+ To use GROBID with Tika, you need to start the 
[[http://grobid.readthedocs.org/en/latest/Grobid-service/|GROBID Service]]. To 
do so, perform the following (note the service will start by default on port 
8080, but that can be changed in the Jetty properties by going to 
[[https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-service/pom.xml|Grobid
 Service's pom.xml]] and editing 
[[https://github.com/kermitt2/grobid/blob/master/grobid-service/pom.xml#L180|line
 180]].
+ 
+  1. `cd $HOME/src/grobid/grobid-service`
+  2. `mvn -Dmaven.test.skip=true jetty:run-war`
+ 
+ Once the server is started, you're good to proceed!
+ 
  == Running GROBID using Tika-App ==
  
  Grab the latest 1.11-SNAPSHOT or later version of Tika-app and run Grobid by 
following the commands below.
  
- First we need to create the GrobidExtractor.properties file that points to 
Grobid Home, and to its configuration directory. My file looks like the 
following:
+ First we need to create the GrobidExtractor.properties file that points to 
the Grobid REST Service. My file looks like the following:
  
  {{{
+ grobid.server.url=http://localhost:8080
- grobid.home=/Users/mattmann/git/grobid/grobid-home
- 
grobid.properties=/Users/mattmann/git/grobid/grobid-home/config/grobid.properties
  }}}
  
  You can download 
[[https://raw.githubusercontent.com/chrismattmann/grobidparser-resources/master/org/apache/tika/parser/journal/GrobidExtractor.properties|GrobidExtractor.properties]]
 as a sample. Or better yet, you can install the following Github project and 
then modify the GrobidExtractor.properties file accordingly.
@@ -35, +43 @@

   1. `cd $HOME/src && git clone 
https://github.com/chrismattmann/grobidparser-resources.git`
   2. edit 
`$HOME/src/grobidparser-resources/org/apache/tika/parser/journal/GrobidExtractor.properties`
  
- Now you can run GROBID via Tika-app with the following command on a sample 
PDF file. Note the order of the classpath - it is extremely important to keep 
the order as it allows Tika and its Jars to come first, and GROBID (and its 
large numbers of Jars) to come last.
+ Now you can run GROBID via Tika-app with the following command on a sample 
PDF file.
  
  {{{
- java -classpath 
$HOME/src/grobidparser-resources/:tika-app-1.11-SNAPSHOT.jar:$HOME/src/grobid/lib/\*
 org.apache.tika.cli.TikaCLI 
--config=$HOME/src/grobidparser-resources/tika-config.xml -J 
$HOME/src/grobid/papers/ICSE06.pdf
+ java -classpath $HOME/src/grobidparser-resources/:tika-app-1.11-SNAPSHOT.jar 
org.apache.tika.cli.TikaCLI 
--config=$HOME/src/grobidparser-resources/tika-config.xml -J 
$HOME/src/grobid/papers/ICSE06.pdf
  }}}
  
  Which should produce as output (e.g., if piped to `python -mjson.tool` for 
pretty printing):
@@ -113, +121 @@

  
  == Will this work from Tika Server? ==
  
- It sure will! When you start Tika Server, use the following command, and 
ordering of the classpath is extremely important, as with Tika-app.
+ It sure will! When you start Tika Server, use the following command.
  
  {{{
- java -classpath 
$HOME/src/grobidparser-resources/:tika-server-1.11-SNAPSHOT.jar:$HOME/src/grobid/lib/\*
 org.apache.tika.server.TikaServerCli --config 
$HOME/src/grobidparser-resources/tika-config.xml
+ java -classpath 
$HOME/src/grobidparser-resources/:tika-server-1.11-SNAPSHOT.jar 
org.apache.tika.server.TikaServerCli --config 
$HOME/src/grobidparser-resources/tika-config.xml
  }}}
  
  Then, PUT a file to Tika-server like so:

Reply via email to