Adrian Bird created TIKA-4737:
---------------------------------
Summary: tika-4.0.0-alpha1 - Batch mode is confusing
Key: TIKA-4737
URL: https://issues.apache.org/jira/browse/TIKA-4737
Project: Tika
Issue Type: Bug
Environment: Windows 11 with Java 17
Reporter: Adrian Bird
Looking at the documentation I've found it very confusing for using what I'll
call 'standard' mode vs 'batch' mode.
# [Batch
Processing|https://tika.apache.org/docs/4.0.0-SNAPSHOT/using-tika/cli/index.html#_batch_processing_tika_async_cli]
says 'For processing large numbers of files, use {{{}tika-async-cli{}}}. It
uses the Tika Pipes architecture with forked JVM processes for fault tolerance.'
The examples uses 'tika-async-cli.jar' but this doesn't exist, but the example
runs with the 'tika-app.jar'.
# By using 'tika-app.jar' it is not clear what makes it run in 'batch' or
'standard ' mode. My assumption is that it is the presence of the '-i' and '-o'
options.
# The help from the 'batch' process differs quite a lot from the options
specified in the Batch Processing page above and in the 'standard' help output.
# The Batch Processing page above doesn't say anything about how to use a
config file, but the help does.
# It is confusing to have 2 different ways of specifying the config file,
depending whether you are using the 'standard' '–config=file.json' or 'batch'
'-c file.json'.
# It would also be useful if a message was output saying whether it was
'standard' or 'batch' mode.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)