Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaBatchUsage" page has been changed by TimothyAllison: https://wiki.apache.org/tika/TikaBatchUsage?action=diff&rev1=1&rev2=2 = Usage = See TikaBatchOverview for a general design overview of tika-batch. + This is all still very much in a dev state. The code is currently available [[https://github.com/tballison/tika/tree/TIKA-1302|here]]. The current goal is to get this into decent enough shape to make it into Tika 1.8. + + == TikaBatch FileSystem (FS) == - To be filled in... + For expert users who don't want to use tika-app or who might want to do custom extensions, there are example driver files and logging config files available in [[https://github.com/tballison/tika/tree/TIKA-1302/tika-batch/src/main/examples|here]]. - == TikaBatch via TikaApp == + == TikaBatch via tika-app-X.X.jar == - Not yet implemented...want to contribute? + There is an initial integration with tika-app on a github [[https://github.com/tballison/tika/tree/TIKA-1302|fork]]. + + You can see the commandline arguments via the regular "-?" or "--help" commands. There is a separate section at the end for tika-batch options. + + In the current dev version. Tika-app decides if it is in batch mode based on one of two signals: + 1. The final argument in the commandline args is a directory + 2. -srcDir is specified in the commandline + + Once the app knows that it is in batch mode, it converts some of the traditional tika-app commandline arguments for use by org.apache.tika.batch.fs.FSBatchProcessCLI. + + Some examples: + + *Most basic (with output to a directory called "output"): + + java -jar tika-app.X.Y.jar <inputDirectory> + + *Set the number of file consumer threads: + + java -jar tika-app.X.Y.jar -numConsumers 10 <inputDirectory> + + *Specify input and output directories: + + java -jar tika-app.X.Y.jar -srcDir /mydata/src/dir -targDir /mydata/output/dir + + *Specify jvm args to be used by the child process (prepend a "J" to the regular args): + + java -jar tika-app.X.Y.jar -JXmx2g -JDlog4j.configuration=file:bin/log4j.xml <inputDirectory> == TikaBatch Server == Module not yet implemented...want to contribute?
