[ 
https://issues.apache.org/jira/browse/TIKA-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083519#comment-18083519
 ] 

Hudson commented on TIKA-4737:
------------------------------

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #1386 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/1386/])
TIKA-4737 -- improve docs for tika-pipes via tika-app (#2836) (github: 
[https://github.com/apache/tika/commit/4bfbdf22cf0d42f3661e5f0abcd42fd7e6190735])
* (edit) docs/modules/ROOT/pages/pipes/configuration.adoc
* (edit) docs/modules/ROOT/pages/using-tika/cli/index.adoc
* (edit) docs/modules/ROOT/pages/pipes/cpu-sizing.adoc
* (edit) docs/modules/ROOT/pages/migration-to-4x/index.adoc
* (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java


> tika-4.0.0-alpha1 - Batch mode is confusing
> -------------------------------------------
>
>                 Key: TIKA-4737
>                 URL: https://issues.apache.org/jira/browse/TIKA-4737
>             Project: Tika
>          Issue Type: Bug
>         Environment: Windows 11 with Java 17
>            Reporter: Adrian Bird
>            Priority: Major
>
> Looking at the documentation I've found it very confusing for using what I'll 
> call 'standard' mode vs 'batch' mode.
>  # [Batch 
> Processing|https://tika.apache.org/docs/4.0.0-SNAPSHOT/using-tika/cli/index.html#_batch_processing_tika_async_cli]
>  says 'For processing large numbers of files, use {{{}tika-async-cli{}}}. It 
> uses the Tika Pipes architecture with forked JVM processes for fault 
> tolerance.'
> The examples uses 'tika-async-cli.jar' but this doesn't exist, but the 
> example runs with the 'tika-app.jar'.
>  # By using 'tika-app.jar' it is not clear what makes it run in 'batch' or 
> 'standard ' mode. My assumption is that it is the presence of the '-i' and 
> '-o' options.
>  # The help from the 'batch' process differs quite a lot from the options 
> specified in the Batch Processing page above and in the 'standard' help 
> output.
>  # The Batch Processing page above doesn't say anything about how to use a 
> config file, but the help does. 
>  #  It is confusing to have 2 different ways of specifying the config file, 
> depending whether you are using the 'standard' '–config=file.json' or 'batch' 
> '-c file.json'.
>  # It would also be useful if a message was output saying whether it was 
> 'standard' or 'batch' mode.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to