Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "GrobidJournalParser" page has been changed by ChrisMattmann: https://wiki.apache.org/tika/GrobidJournalParser?action=diff&rev1=2&rev2=3 Comment: - update to deal with the new REST Grobid Check the `out` directory, you should see `*.tei.xml` files in there. + === Start the GROBID Service === + + To use GROBID with Tika, you need to start the [[http://grobid.readthedocs.org/en/latest/Grobid-service/|GROBID Service]]. To do so, perform the following (note the service will start by default on port 8080, but that can be changed in the Jetty properties by going to [[https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-service/pom.xml|Grobid Service's pom.xml]] and editing [[https://github.com/kermitt2/grobid/blob/master/grobid-service/pom.xml#L180|line 180]]. + + 1. `cd $HOME/src/grobid/grobid-service` + 2. `mvn -Dmaven.test.skip=true jetty:run-war` + + Once the server is started, you're good to proceed! + == Running GROBID using Tika-App == Grab the latest 1.11-SNAPSHOT or later version of Tika-app and run Grobid by following the commands below. - First we need to create the GrobidExtractor.properties file that points to Grobid Home, and to its configuration directory. My file looks like the following: + First we need to create the GrobidExtractor.properties file that points to the Grobid REST Service. My file looks like the following: {{{ + grobid.server.url=http://localhost:8080 - grobid.home=/Users/mattmann/git/grobid/grobid-home - grobid.properties=/Users/mattmann/git/grobid/grobid-home/config/grobid.properties }}} You can download [[https://raw.githubusercontent.com/chrismattmann/grobidparser-resources/master/org/apache/tika/parser/journal/GrobidExtractor.properties|GrobidExtractor.properties]] as a sample. Or better yet, you can install the following Github project and then modify the GrobidExtractor.properties file accordingly. @@ -35, +43 @@ 1. `cd $HOME/src && git clone https://github.com/chrismattmann/grobidparser-resources.git` 2. edit `$HOME/src/grobidparser-resources/org/apache/tika/parser/journal/GrobidExtractor.properties` - Now you can run GROBID via Tika-app with the following command on a sample PDF file. Note the order of the classpath - it is extremely important to keep the order as it allows Tika and its Jars to come first, and GROBID (and its large numbers of Jars) to come last. + Now you can run GROBID via Tika-app with the following command on a sample PDF file. {{{ - java -classpath $HOME/src/grobidparser-resources/:tika-app-1.11-SNAPSHOT.jar:$HOME/src/grobid/lib/\* org.apache.tika.cli.TikaCLI --config=$HOME/src/grobidparser-resources/tika-config.xml -J $HOME/src/grobid/papers/ICSE06.pdf + java -classpath $HOME/src/grobidparser-resources/:tika-app-1.11-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=$HOME/src/grobidparser-resources/tika-config.xml -J $HOME/src/grobid/papers/ICSE06.pdf }}} Which should produce as output (e.g., if piped to `python -mjson.tool` for pretty printing): @@ -113, +121 @@ == Will this work from Tika Server? == - It sure will! When you start Tika Server, use the following command, and ordering of the classpath is extremely important, as with Tika-app. + It sure will! When you start Tika Server, use the following command. {{{ - java -classpath $HOME/src/grobidparser-resources/:tika-server-1.11-SNAPSHOT.jar:$HOME/src/grobid/lib/\* org.apache.tika.server.TikaServerCli --config $HOME/src/grobidparser-resources/tika-config.xml + java -classpath $HOME/src/grobidparser-resources/:tika-server-1.11-SNAPSHOT.jar org.apache.tika.server.TikaServerCli --config $HOME/src/grobidparser-resources/tika-config.xml }}} Then, PUT a file to Tika-server like so:
