Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "MockParser" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/MockParser?action=diff&rev1=1&rev2=2

  == Background ==
  So, you've tried Tika on a couple of files and all works well.  Problem 
solved!
  
+ No. 
+ 
- No. In very rare cases, Tika can so some really bad things.  We try to fix 
these problems when we can, but if history is any indication (e.g. 
[[https://issues.apache.org/jira/browse/TIKA-1132|TIKA-1132]]), if you are 
processing millions of files, you'll need to defend against:
+ In very rare cases, Tika can so some really bad things.  We try to fix these 
problems when we can, but if history is any indication (e.g. 
[[https://issues.apache.org/jira/browse/TIKA-1132|TIKA-1132]]), if you are 
processing millions/billions of files from the wild, you'll need to defend 
against:
  
   1. Regular catchable exceptions
   2. !OutOfMemory errors which can put the jvm in an unreliable state
@@ -24, +26 @@

  `java -cp "bin/*" org.apache.tika.TikaCLI mock_example.xml`
  
  === Tika-server ===
- Place the tika-server.jar and the tika-core.tests.jar in a "bin directory.
+ Place the tika-server.jar and the tika-core.tests.jar in a "bin" directory.
  
- `java -cp "serverbin/*" org.apache.tika.server.TikaServerCli`
+ `java -cp "bin/*" org.apache.tika.server.TikaServerCli`
+ 
+ Then curl away:
+ 
+ `curl -T mock_example.xml http://localhost:9998/rmeta/text`
  
  === Your Framework ===
  Place the tika-core-tests.jar on your class path (NOT IN PRODUCTION!!!) and 
then add some mock.xml files to your batch of documents.
  
  
- 
- Then curl away:
- 
- `curl -T mock_example.xml http://localhost:9998/rmeta/text`
  
  === Mock options ===
  See the mock example.xml file in 
tika-parsers/src/test/resources/test-documents/mock.  
@@ -84, +86 @@

  </mock>
  
  ``
+ == References ==
+  1. 
[[http://openpreservation.org/blog/2014/03/21/tika-ride-characterising-web-content-nanite/|Tika
 to Ride]]
+  2. 
[[http://events.linuxfoundation.org/sites/events/files/slides/TikaEval_ACNA15_allison_herceg_v2.pdf|Evaluating
 Text Extraction]]
  

Reply via email to