Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "MockParser" page has been changed by TimothyAllison: https://wiki.apache.org/tika/MockParser?action=diff&rev1=1&rev2=2 == Background == So, you've tried Tika on a couple of files and all works well. Problem solved! + No. + - No. In very rare cases, Tika can so some really bad things. We try to fix these problems when we can, but if history is any indication (e.g. [[https://issues.apache.org/jira/browse/TIKA-1132|TIKA-1132]]), if you are processing millions of files, you'll need to defend against: + In very rare cases, Tika can so some really bad things. We try to fix these problems when we can, but if history is any indication (e.g. [[https://issues.apache.org/jira/browse/TIKA-1132|TIKA-1132]]), if you are processing millions/billions of files from the wild, you'll need to defend against: 1. Regular catchable exceptions 2. !OutOfMemory errors which can put the jvm in an unreliable state @@ -24, +26 @@ `java -cp "bin/*" org.apache.tika.TikaCLI mock_example.xml` === Tika-server === - Place the tika-server.jar and the tika-core.tests.jar in a "bin directory. + Place the tika-server.jar and the tika-core.tests.jar in a "bin" directory. - `java -cp "serverbin/*" org.apache.tika.server.TikaServerCli` + `java -cp "bin/*" org.apache.tika.server.TikaServerCli` + + Then curl away: + + `curl -T mock_example.xml http://localhost:9998/rmeta/text` === Your Framework === Place the tika-core-tests.jar on your class path (NOT IN PRODUCTION!!!) and then add some mock.xml files to your batch of documents. - - Then curl away: - - `curl -T mock_example.xml http://localhost:9998/rmeta/text` === Mock options === See the mock example.xml file in tika-parsers/src/test/resources/test-documents/mock. @@ -84, +86 @@ </mock> `` + == References == + 1. [[http://openpreservation.org/blog/2014/03/21/tika-ride-characterising-web-content-nanite/|Tika to Ride]] + 2. [[http://events.linuxfoundation.org/sites/events/files/slides/TikaEval_ACNA15_allison_herceg_v2.pdf|Evaluating Text Extraction]]
