[
https://issues.apache.org/jira/browse/TIKA-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083908#comment-18083908
]
ASF GitHub Bot commented on TIKA-4734:
--------------------------------------
Copilot commented on code in PR #2843:
URL: https://github.com/apache/tika/pull/2843#discussion_r3312314661
##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -604,27 +614,27 @@ public void process(String arg) throws Exception {
System.out.println(localConfig.getConfig().toString());
}*/
- private void convertConfigXmlToJson(String paths) throws Exception {
- String[] parts = paths.split(",");
- if (parts.length != 2) {
- System.err.println("Error: --convert-config-xml-to-json requires
input and output paths separated by comma");
- System.err.println("Usage:
--convert-config-xml-to-json=<input.xml>,<output.json>");
+ private void convertConfigXmlToJson(String inputPath) throws Exception {
+ if (inputPath == null || inputPath.trim().isEmpty()) {
+ System.err.println("Error: --convert-config-xml-to-json requires
an input XML path");
+ System.err.println("Usage:
--convert-config-xml-to-json=<input.xml> > <output.json>");
return;
}
- Path xmlPath = Paths.get(parts[0].trim());
- Path jsonPath = Paths.get(parts[1].trim());
+ Path xmlPath = Paths.get(inputPath.trim());
if (!Files.exists(xmlPath)) {
System.err.println("Error: Input XML file not found: " + xmlPath);
return;
}
Review Comment:
convertConfigXmlToJson() only checks `Files.exists(xmlPath)`. If the path
exists but is a directory (or otherwise not a regular readable file),
`Files.newInputStream(xmlPath)` will throw and you'll get a stack trace rather
than the friendly CLI error. Consider checking `Files.isRegularFile(xmlPath)`
(and possibly `Files.isReadable`) and emitting a clear error message before
opening the stream.
##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -771,16 +781,18 @@ private void usage() {
out.println();
out.println(" -g or --gui Start the Apache Tika GUI");
out.println();
- out.println(" --config=<tika-config.xml>");
- out.println(" TikaConfig file. Must be specified before -g, -s,
-f or the dump-x-config !");
+ out.println(" --config=<tika-config.json>");
+ out.println(" TikaConfig file (JSON as of Tika 4.x). Must be
specified before -g, -s or -f !");
Review Comment:
The usage text says the config must be specified before `-s`, but
`-s/--server` is explicitly unsupported in this CLI (process() throws
IllegalArgumentException for it). Consider removing `-s` from this message to
avoid confusing users, and keep the wording aligned with actual supported
options.
##########
tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java:
##########
@@ -760,6 +760,25 @@ public void testListParserDetailApt() throws Exception {
assertTrue(content.contains("application/vnd.oasis.opendocument.text-web"));
}
+ /**
+ * Tests --convert-config-xml-to-json with no separate config file.
+ * Regression test for TIKA-4734: the flag used to be misrouted to async
+ * mode (the input arg ended in ".json"), failing with a
TikaConfigException
+ * unless a --config was also passed. It must now run standalone and write
+ * the converted JSON to stdout.
+ */
+ @Test
+ public void testConvertConfigXmlToJson() throws Exception {
+ String xmlPath =
Paths.get(getClass().getResource("/xml-configs/tika-config-simple.xml").toURI()).toString();
+ String content = getParamOutContent("--convert-config-xml-to-json=" +
xmlPath);
+
+ // stdout should contain the converted JSON (and only the JSON)
+ assertTrue(content.contains("\"parsers\""), "Expected JSON parsers
section, got: " + content);
+ assertTrue(content.contains("pdf-parser"), "Expected pdf-parser in
output, got: " + content);
+ assertTrue(content.contains("\"sortByPosition\" : true"), "Expected
converted param, got: " + content);
+ assertTrue(content.trim().startsWith("{"), "Output should be pure
JSON, got: " + content);
Review Comment:
This test asserts specific pretty-printed JSON substrings (including spacing
like `"sortByPosition" : true`), which is brittle if the JSON formatter
changes. To make the regression test more stable, consider parsing `content` as
JSON and asserting on the resulting structure/values (e.g., that the parsers
array includes `pdf-parser` with `sortByPosition=true`).
> tika-4.0.0-alpha1 - convert-config-xml-to-json fails if no config specified
> ---------------------------------------------------------------------------
>
> Key: TIKA-4734
> URL: https://issues.apache.org/jira/browse/TIKA-4734
> Project: Tika
> Issue Type: Bug
> Affects Versions: 4.0.0
> Environment: Windows 11 with Java 17.
> Reporter: Adrian Bird
> Priority: Major
>
> I tried to convert my V3 config.xml files into V4 config.json files and
> looked at the example in the
> [documentation|https://tika.apache.org/docs/4.0.0-SNAPSHOT/migration-to-4x/migrating-to-4x.html#_configuration_xml_to_json]
> {code:java}
> java -jar tika-app.jar
> --convert-config-xml-to-json=tika-config.xml,tika-config.json{code}
> I got this error when trying it with my file:
> {code:java}
> %JAVA_HOME%\bin\java -jar %TIKA_JAR%
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json
> Exception in thread "main" org.apache.tika.exception.TikaConfigException:
> Failed to load config from:
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json{code}
> I got it to work if I specified a config file in the command:
> {code:java}
> %JAVA_HOME%\bin\java -jar %TIKA_JAR% -config=config-template.json
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json{code}
> *Is it meant to work without a config file?*
> I'll mention something else here, but can create a new Jira if you want. The
> output from using '–help' contains this line:
> {noformat}
> --config=<tika-config.xml>{noformat}
> which I assume should be a .json file.
> There is also a 'tika-config.xml' mentioned in the batch help output.
>
>
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)