Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "TikaBatchUsage" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/TikaBatchUsage?action=diff&rev1=1&rev2=2

  = Usage =
  See TikaBatchOverview for a general design overview of tika-batch.
  
+ This is all still very much in a dev state.  The code is currently available 
[[https://github.com/tballison/tika/tree/TIKA-1302|here]].  The current goal is 
to get this into decent enough shape to make it into Tika 1.8.
+ 
+ 
  == TikaBatch FileSystem (FS) ==
- To be filled in...
+ For expert users who don't want to use tika-app or who might want to do 
custom extensions, there are example driver files and logging config files 
available in 
[[https://github.com/tballison/tika/tree/TIKA-1302/tika-batch/src/main/examples|here]].
  
- == TikaBatch via TikaApp ==
+ == TikaBatch via tika-app-X.X.jar ==
- Not yet implemented...want to contribute?
+ There is an initial integration with tika-app on a github 
[[https://github.com/tballison/tika/tree/TIKA-1302|fork]].
+ 
+ You can see the commandline arguments via the regular "-?" or "--help" 
commands.  There is a separate section at the end for tika-batch options.
+ 
+ In the current dev version.  Tika-app decides if it is in batch mode based on 
one of two signals:
+ 1. The final argument in the commandline args is a directory
+ 2. -srcDir is specified in the commandline
+ 
+ Once the app knows that it is in batch mode, it converts some of the 
traditional tika-app commandline arguments for use by 
org.apache.tika.batch.fs.FSBatchProcessCLI.
+ 
+ Some examples:
+ 
+  *Most basic (with output to a directory called "output"):
+ 
+       java -jar tika-app.X.Y.jar <inputDirectory>
+ 
+  *Set the number of file consumer threads:
+ 
+       java -jar tika-app.X.Y.jar -numConsumers 10 <inputDirectory>
+ 
+  *Specify input and output directories:
+ 
+       java -jar tika-app.X.Y.jar -srcDir /mydata/src/dir -targDir 
/mydata/output/dir
+ 
+  *Specify jvm args to be used by the child process (prepend a "J" to the 
regular args):
+ 
+       java -jar tika-app.X.Y.jar -JXmx2g 
-JDlog4j.configuration=file:bin/log4j.xml <inputDirectory>
  
  == TikaBatch Server ==
  Module not yet implemented...want to contribute?

Reply via email to