Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Tika Wiki" for change
The "MockParser" page has been changed by TimothyAllison:
= MockParser =
== Background ==
So, you've tried Tika on a couple of files and all works well. Problem solved!
No. In very rare cases, Tika can so some really bad things. We try to fix
these problems when we can, but if history is any indication (e.g.
[[https://issues.apache.org/jira/browse/TIKA-1132|TIKA-1132]]), if you are
processing millions of files, you'll need to defend against:
1. Regular catchable exceptions
2. !OutOfMemory errors which can put the jvm in an unreliable state
3. Permanent hangs (Tika can chew up massive amounts of resources and go
4. Security vulnerabilities (e.g.
Please note that for 3., permanent hangs -- you cannot terminate the Thread.
Thread's ''stop'', ''suspend'', ''destroy'' sound like they'll do the trick,
but they won't. '''You need to kill the entire process.'''
As of Tika 1.15, we added a MockParser in the tika-core-tests.jar that will
allow you to test your framework against 1-3. Simply add that jar to your
class path and then include a <mock> xml file in your set of test documents,
and crash, crash away.
== Usage ==
=== Tika-app ===
Place the tika-app.jar and the tika-core-tests.jar in a "bin" directory.
`java -cp "bin/*" org.apache.tika.TikaCLI mock_example.xml`
=== Tika-server ===
Place the tika-server.jar and the tika-core.tests.jar in a "bin directory.
`java -cp "serverbin/*" org.apache.tika.server.TikaServerCli`
=== Your Framework ===
Place the tika-core-tests.jar on your class path (NOT IN PRODUCTION!!!) and
then add some mock.xml files to your batch of documents.
Then curl away:
`curl -T mock_example.xml http://localhost:9998/rmeta/text`
=== Mock options ===
See the mock example.xml file in
This shows all of the examples of what you can do.
<?xml version="1.0" encoding="UTF-8" ?>
<!-- this file offers all of the options as documentation.
Parsing will stop at an IOException, of course
<!-- action can be "add" or "set" -->
<metadata action="add" name="author">Nikolai Lobachevsky</metadata>
<!-- element is the name of the sax event to write, p=paragraph
if the element is not specified, the default is <p> -->
<write element="p">some content</write>
<!-- write something to System.out -->
<print_out>writing to System.out</print_out>
<!-- write something to System.err -->
<print_err>writing to System.err</print_err>
millis: how many milliseconds to pause. The actual hang time will
be a bit longer than the value specified.
heavy: whether or not the hang should do something computationally
If the value is false, this just does a Thread.sleep(millis).
This attribute is optional, with default of heavy=false.
pulse_millis: (required if "heavy" is true), how often to check to see
whether the thread was interrupted or that the total hang time
exceeded the millis
interruptible: whether or not the parser will check to see if its thread
has been interrupted; this attribute is optional with default of
<hang millis="100" heavy="true" pulse_millis="10" interruptible="true" />
<!-- throw an exception or error; optionally include a message or not -->
<throw class="java.io.IOException">not another IOException</throw>
<!-- perform a genuine !OutOfMemoryError -->