Adrian Bird created TIKA-4735:
---------------------------------

             Summary: tika-4.0.0-alpha1 - batch output contains JSON wrapper 
and metadata with --content-only
                 Key: TIKA-4735
                 URL: https://issues.apache.org/jira/browse/TIKA-4735
             Project: Tika
          Issue Type: Bug
    Affects Versions: 4.0.0
         Environment: Windows 11 with Java 17
            Reporter: Adrian Bird


The [Basic Batch Usage 
Documentation|https://tika.apache.org/docs/4.0.0-SNAPSHOT/using-tika/cli/index.html#_basic_batch_usage]
 has this example:
{noformat}
java -jar tika-async-cli.jar -i /path/to/input -o /path/to/output -h m 
--content-only{noformat}
and description:
This produces .md files in the output directory containing just the extracted 
markdown content — no JSON wrappers, no metadata fields.

The example doesn't work because -h means help. -h is listed in the options 
section.
The help that was produced just lists '--handler' for the option.

My actual issue is with the output of the batch processing. My example:
{noformat}
%JAVA_HOME%\bin\java -jar %TIKA_JAR%  -i Input -o Output --handler m 
--content-only{noformat}
creates a .md file but it has a JSON wrapper and metadata fields and the 
content isn't plain text.

I get a JSON wrapper and metadata for all the --handler formats.

Also, if I remove the --content-only argument I get a .json file and not a .md 
file.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to