[
https://issues.apache.org/jira/browse/TIKA-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084058#comment-18084058
]
ASF GitHub Bot commented on TIKA-4734:
--------------------------------------
tballison commented on code in PR #2843:
URL: https://github.com/apache/tika/pull/2843#discussion_r3316821970
##########
tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java:
##########
@@ -760,6 +760,25 @@ public void testListParserDetailApt() throws Exception {
assertTrue(content.contains("application/vnd.oasis.opendocument.text-web"));
}
+ /**
+ * Tests --convert-config-xml-to-json with no separate config file.
+ * Regression test for TIKA-4734: the flag used to be misrouted to async
+ * mode (the input arg ended in ".json"), failing with a
TikaConfigException
+ * unless a --config was also passed. It must now run standalone and write
+ * the converted JSON to stdout.
+ */
+ @Test
+ public void testConvertConfigXmlToJson() throws Exception {
+ String xmlPath =
Paths.get(getClass().getResource("/xml-configs/tika-config-simple.xml").toURI()).toString();
+ String content = getParamOutContent("--convert-config-xml-to-json=" +
xmlPath);
+
+ // stdout should contain the converted JSON (and only the JSON)
+ assertTrue(content.contains("\"parsers\""), "Expected JSON parsers
section, got: " + content);
+ assertTrue(content.contains("pdf-parser"), "Expected pdf-parser in
output, got: " + content);
+ assertTrue(content.contains("\"sortByPosition\" : true"), "Expected
converted param, got: " + content);
+ assertTrue(content.trim().startsWith("{"), "Output should be pure
JSON, got: " + content);
Review Comment:
Agreed — rewrote the assertions to parse with `ObjectMapper` and check
structure rather than pretty-printed substrings: parsers must be a non-empty
array; the array must contain an entry with a `pdf-parser` key; that entry's
`sortByPosition` (resolved via `findValue`) must be `true`. No
formatter-spacing dependency.
> tika-4.0.0-alpha1 - convert-config-xml-to-json fails if no config specified
> ---------------------------------------------------------------------------
>
> Key: TIKA-4734
> URL: https://issues.apache.org/jira/browse/TIKA-4734
> Project: Tika
> Issue Type: Bug
> Affects Versions: 4.0.0
> Environment: Windows 11 with Java 17.
> Reporter: Adrian Bird
> Priority: Major
>
> I tried to convert my V3 config.xml files into V4 config.json files and
> looked at the example in the
> [documentation|https://tika.apache.org/docs/4.0.0-SNAPSHOT/migration-to-4x/migrating-to-4x.html#_configuration_xml_to_json]
> {code:java}
> java -jar tika-app.jar
> --convert-config-xml-to-json=tika-config.xml,tika-config.json{code}
> I got this error when trying it with my file:
> {code:java}
> %JAVA_HOME%\bin\java -jar %TIKA_JAR%
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json
> Exception in thread "main" org.apache.tika.exception.TikaConfigException:
> Failed to load config from:
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json{code}
> I got it to work if I specified a config file in the command:
> {code:java}
> %JAVA_HOME%\bin\java -jar %TIKA_JAR% -config=config-template.json
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json{code}
> *Is it meant to work without a config file?*
> I'll mention something else here, but can create a new Jira if you want. The
> output from using '–help' contains this line:
> {noformat}
> --config=<tika-config.xml>{noformat}
> which I assume should be a .json file.
> There is also a 'tika-config.xml' mentioned in the batch help output.
>
>
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)