[
https://issues.apache.org/jira/browse/TIKA-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083498#comment-18083498
]
ASF GitHub Bot commented on TIKA-4737:
--------------------------------------
tballison merged PR #2836:
URL: https://github.com/apache/tika/pull/2836
> tika-4.0.0-alpha1 - Batch mode is confusing
> -------------------------------------------
>
> Key: TIKA-4737
> URL: https://issues.apache.org/jira/browse/TIKA-4737
> Project: Tika
> Issue Type: Bug
> Environment: Windows 11 with Java 17
> Reporter: Adrian Bird
> Priority: Major
>
> Looking at the documentation I've found it very confusing for using what I'll
> call 'standard' mode vs 'batch' mode.
> # [Batch
> Processing|https://tika.apache.org/docs/4.0.0-SNAPSHOT/using-tika/cli/index.html#_batch_processing_tika_async_cli]
> says 'For processing large numbers of files, use {{{}tika-async-cli{}}}. It
> uses the Tika Pipes architecture with forked JVM processes for fault
> tolerance.'
> The examples uses 'tika-async-cli.jar' but this doesn't exist, but the
> example runs with the 'tika-app.jar'.
> # By using 'tika-app.jar' it is not clear what makes it run in 'batch' or
> 'standard ' mode. My assumption is that it is the presence of the '-i' and
> '-o' options.
> # The help from the 'batch' process differs quite a lot from the options
> specified in the Batch Processing page above and in the 'standard' help
> output.
> # The Batch Processing page above doesn't say anything about how to use a
> config file, but the help does.
> # It is confusing to have 2 different ways of specifying the config file,
> depending whether you are using the 'standard' '–config=file.json' or 'batch'
> '-c file.json'.
> # It would also be useful if a message was output saying whether it was
> 'standard' or 'batch' mode.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)