[ 
https://issues.apache.org/jira/browse/TIKA-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083908#comment-18083908
 ] 

ASF GitHub Bot commented on TIKA-4734:
--------------------------------------

Copilot commented on code in PR #2843:
URL: https://github.com/apache/tika/pull/2843#discussion_r3312314661


##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -604,27 +614,27 @@ public void process(String arg) throws Exception {
         System.out.println(localConfig.getConfig().toString());
     }*/
 
-    private void convertConfigXmlToJson(String paths) throws Exception {
-        String[] parts = paths.split(",");
-        if (parts.length != 2) {
-            System.err.println("Error: --convert-config-xml-to-json requires 
input and output paths separated by comma");
-            System.err.println("Usage: 
--convert-config-xml-to-json=<input.xml>,<output.json>");
+    private void convertConfigXmlToJson(String inputPath) throws Exception {
+        if (inputPath == null || inputPath.trim().isEmpty()) {
+            System.err.println("Error: --convert-config-xml-to-json requires 
an input XML path");
+            System.err.println("Usage: 
--convert-config-xml-to-json=<input.xml> > <output.json>");
             return;
         }
 
-        Path xmlPath = Paths.get(parts[0].trim());
-        Path jsonPath = Paths.get(parts[1].trim());
+        Path xmlPath = Paths.get(inputPath.trim());
 
         if (!Files.exists(xmlPath)) {
             System.err.println("Error: Input XML file not found: " + xmlPath);
             return;
         }

Review Comment:
   convertConfigXmlToJson() only checks `Files.exists(xmlPath)`. If the path 
exists but is a directory (or otherwise not a regular readable file), 
`Files.newInputStream(xmlPath)` will throw and you'll get a stack trace rather 
than the friendly CLI error. Consider checking `Files.isRegularFile(xmlPath)` 
(and possibly `Files.isReadable`) and emitting a clear error message before 
opening the stream.
   



##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -771,16 +781,18 @@ private void usage() {
         out.println();
         out.println("    -g  or --gui           Start the Apache Tika GUI");
         out.println();
-        out.println("    --config=<tika-config.xml>");
-        out.println("        TikaConfig file. Must be specified before -g, -s, 
-f or the dump-x-config !");
+        out.println("    --config=<tika-config.json>");
+        out.println("        TikaConfig file (JSON as of Tika 4.x). Must be 
specified before -g, -s or -f !");

Review Comment:
   The usage text says the config must be specified before `-s`, but 
`-s/--server` is explicitly unsupported in this CLI (process() throws 
IllegalArgumentException for it). Consider removing `-s` from this message to 
avoid confusing users, and keep the wording aligned with actual supported 
options.
   



##########
tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java:
##########
@@ -760,6 +760,25 @@ public void testListParserDetailApt() throws Exception {
         
assertTrue(content.contains("application/vnd.oasis.opendocument.text-web"));
     }
 
+    /**
+     * Tests --convert-config-xml-to-json with no separate config file.
+     * Regression test for TIKA-4734: the flag used to be misrouted to async
+     * mode (the input arg ended in ".json"), failing with a 
TikaConfigException
+     * unless a --config was also passed. It must now run standalone and write
+     * the converted JSON to stdout.
+     */
+    @Test
+    public void testConvertConfigXmlToJson() throws Exception {
+        String xmlPath = 
Paths.get(getClass().getResource("/xml-configs/tika-config-simple.xml").toURI()).toString();
+        String content = getParamOutContent("--convert-config-xml-to-json=" + 
xmlPath);
+
+        // stdout should contain the converted JSON (and only the JSON)
+        assertTrue(content.contains("\"parsers\""), "Expected JSON parsers 
section, got: " + content);
+        assertTrue(content.contains("pdf-parser"), "Expected pdf-parser in 
output, got: " + content);
+        assertTrue(content.contains("\"sortByPosition\" : true"), "Expected 
converted param, got: " + content);
+        assertTrue(content.trim().startsWith("{"), "Output should be pure 
JSON, got: " + content);

Review Comment:
   This test asserts specific pretty-printed JSON substrings (including spacing 
like `"sortByPosition" : true`), which is brittle if the JSON formatter 
changes. To make the regression test more stable, consider parsing `content` as 
JSON and asserting on the resulting structure/values (e.g., that the parsers 
array includes `pdf-parser` with `sortByPosition=true`).





> tika-4.0.0-alpha1 - convert-config-xml-to-json fails if no config specified
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-4734
>                 URL: https://issues.apache.org/jira/browse/TIKA-4734
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>         Environment: Windows 11 with Java 17.
>            Reporter: Adrian Bird
>            Priority: Major
>
> I tried to convert my V3 config.xml files into V4 config.json files and 
> looked at the example in the 
> [documentation|https://tika.apache.org/docs/4.0.0-SNAPSHOT/migration-to-4x/migrating-to-4x.html#_configuration_xml_to_json]
> {code:java}
> java -jar tika-app.jar 
> --convert-config-xml-to-json=tika-config.xml,tika-config.json{code}
> I got this error when trying it with my file:
> {code:java}
> %JAVA_HOME%\bin\java -jar %TIKA_JAR%  
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json
> Exception in thread "main" org.apache.tika.exception.TikaConfigException: 
> Failed to load config from: 
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json{code}
> I got it to work if I specified a config file in the command:
> {code:java}
> %JAVA_HOME%\bin\java -jar %TIKA_JAR% -config=config-template.json 
> --convert-config-xml-to-json=tika-exif2-config.xml,tika-exif2-config.json{code}
> *Is it meant to work without a config file?*
> I'll mention something else here, but can create a new Jira if you want. The 
> output from using '–help' contains this line:
> {noformat}
>     --config=<tika-config.xml>{noformat}
> which I assume should be a .json file.
> There is also a 'tika-config.xml' mentioned in the batch help output.
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to