[ 
https://issues.apache.org/jira/browse/TIKA-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084066#comment-18084066
 ] 

ASF GitHub Bot commented on TIKA-4734:
--------------------------------------

tballison commented on code in PR #2843:
URL: https://github.com/apache/tika/pull/2843#discussion_r3316821970


##########
tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java:
##########
@@ -760,6 +760,25 @@ public void testListParserDetailApt() throws Exception {
         
assertTrue(content.contains("application/vnd.oasis.opendocument.text-web"));
     }
 
+    /**
+     * Tests --convert-config-xml-to-json with no separate config file.
+     * Regression test for TIKA-4734: the flag used to be misrouted to async
+     * mode (the input arg ended in ".json"), failing with a 
TikaConfigException
+     * unless a --config was also passed. It must now run standalone and write
+     * the converted JSON to stdout.
+     */
+    @Test
+    public void testConvertConfigXmlToJson() throws Exception {
+        String xmlPath = 
Paths.get(getClass().getResource("/xml-configs/tika-config-simple.xml").toURI()).toString();
+        String content = getParamOutContent("--convert-config-xml-to-json=" + 
xmlPath);
+
+        // stdout should contain the converted JSON (and only the JSON)
+        assertTrue(content.contains("\"parsers\""), "Expected JSON parsers 
section, got: " + content);
+        assertTrue(content.contains("pdf-parser"), "Expected pdf-parser in 
output, got: " + content);
+        assertTrue(content.contains("\"sortByPosition\" : true"), "Expected 
converted param, got: " + content);
+        assertTrue(content.trim().startsWith("{"), "Output should be pure 
JSON, got: " + content);

Review Comment:
   Agreed — rewrote the assertions to parse with `ObjectMapper` and check 
structure rather than pretty-printed substrings: parsers must be a non-empty 
array; the array must contain an entry with a `pdf-parser` key; that entry's 
`sortByPosition` (resolved via `findValue`) must be `true`. No 
formatter-spacing dependency.





> tika-4.0.0-alpha1 - convert-config-xml-to-json fails if no config specified
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-4734
>                 URL: https://issues.apache.org/jira/browse/TIKA-4734
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>         Environment: Windows 11 with Java 17.
>            Reporter: Adrian Bird
>            Priority: Major
>
> I tried to convert my V3 config.xml files into V4 config.json files and 
> looked at the example in the 
> [documentation|https://tika.apache.org/docs/4.0.0-SNAPSHOT/migration-to-4x/migrating-to-4x.html#_configuration_xml_to_json]
> {code:java}
> java -jar tika-app.jar 
> --convert-config-xml-to-json=tika-config.xml,tika-config.json{code}
> I got this error when trying it with my file:
> {code:java}
> %JAVA_HOME%\bin\java -jar %TIKA_JAR%  
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json
> Exception in thread "main" org.apache.tika.exception.TikaConfigException: 
> Failed to load config from: 
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json{code}
> I got it to work if I specified a config file in the command:
> {code:java}
> %JAVA_HOME%\bin\java -jar %TIKA_JAR% -config=config-template.json 
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json{code}
> *Is it meant to work without a config file?*
> I'll mention something else here, but can create a new Jira if you want. The 
> output from using '–help' contains this line:
> {noformat}
>     --config=<tika-config.xml>{noformat}
> which I assume should be a .json file.
> There is also a 'tika-config.xml' mentioned in the batch help output.
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to