[ 
https://issues.apache.org/jira/browse/TIKA-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086390#comment-18086390
 ] 

ASF GitHub Bot commented on TIKA-4753:
--------------------------------------

Copilot commented on code in PR #2870:
URL: https://github.com/apache/tika/pull/2870#discussion_r3363328003


##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerIntegrationTest.java:
##########
@@ -212,11 +219,25 @@ public void testTimeout() throws Exception {
 
         // Server should return 503 (Service Unavailable) for timeout
         assertEquals(503, response.getStatus());
+        assertErrorResponseStatus(response, "TIMEOUT");
 
         // Server should still be running - verify with a successful request
         testBaseline();
     }
 
+    /**
+     * Asserts that an error response body is JSON with a {@code status} field 
matching
+     * {@code expectedStatus} (a {@code PipesResult.RESULT_STATUS} enum name).
+     */
+    private void assertErrorResponseStatus(Response response, String 
expectedStatus) throws IOException {
+        try (InputStream is = (InputStream) response.getEntity()) {
+            String body = IOUtils.toString(is, StandardCharsets.UTF_8);
+            JsonNode node = new ObjectMapper().readTree(body);
+            assertEquals(expectedStatus, node.path("status").asText(null),
+                    "Expected JSON error body with status=" + expectedStatus + 
" but got: " + body);
+        }
+    }

Review Comment:
   The new helper method is not indented consistently with the rest of the 
class, and it can reuse the existing static `UTF_8` import rather than 
`StandardCharsets.UTF_8`.



##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerIntegrationTest.java:
##########
@@ -29,12 +29,15 @@
 import java.net.http.HttpClient;
 import java.net.http.HttpRequest;
 import java.net.http.HttpResponse;
+import java.nio.charset.StandardCharsets;

Review Comment:
   After switching the helper to use the existing `UTF_8` static import, this 
`StandardCharsets` import becomes unused and should be removed to avoid 
compiler/linter warnings.



##########
docs/modules/ROOT/pages/using-tika/server/index.adoc:
##########
@@ -156,6 +156,41 @@ curl -T document.pdf 
http://localhost:9998/meta/Content-Type   # single field
 * `/translate/all/\{translator}/\{src}/\{dest}` — translation
 * `/pipes`, `/async` — Pipes-based bulk processing
 
+== Error Responses
+
+When parsing fails due to a process-level problem — the forked child process 
timed out,
+ran out of memory, or crashed unexpectedly — the server returns an HTTP error 
with a
+JSON body whose shape matches the `PipesResult` status:
+
+[source,json]
+----
+{"status": "TIMEOUT", "message": "Task timed out after 60000ms"}
+----
+
+The `status` field is the `PipesResult.RESULT_STATUS` enum name. The `message` 
field is
+present when Tika provided one, absent otherwise.
+
+[cols="1,1,3"]
+|===
+|HTTP status |`status` values |Meaning
+
+|`503 Service Unavailable`
+|`TIMEOUT`, `OOM`, `UNSPECIFIED_CRASH`, `CLIENT_UNAVAILABLE_WITHIN_MS`
+|The forked parse process failed. The server is still healthy; the client may 
retry.

Review Comment:
   The 503 row includes `CLIENT_UNAVAILABLE_WITHIN_MS`, which is not a 
forked-process crash (it means no pipes client was available within the 
configured wait time). The meaning text should be broadened so it accurately 
covers all listed statuses.



##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +187,54 @@ private String getSuffix(Metadata metadata) {
         return ".tmp";
     }
 
+    /**
+     * Builds a JSON error response whose shape matches PipesResult 
serialization:
+     * {@code {"status": "TIMEOUT", "message": "..."}}
+     * <p>
+     * This allows clients to distinguish failure modes (TIMEOUT, OOM, 
UNSPECIFIED_CRASH, …)
+     * without parsing plain-text bodies or inspecting custom headers.
+     */
+    private static Response buildProcessFailureResponse(PipesResult result) {
+        ObjectMapper mapper = new ObjectMapper();
+        ObjectNode node = mapper.createObjectNode();
+        node.put("status", result.status().name());
+        if (result.message() != null && !result.message().isBlank()) {
+            node.put("message", result.message());
+        }
+        String json;
+        try {
+            json = mapper.writeValueAsString(node);
+        } catch (Exception e) {
+            json = "{\"status\":\"" + result.status().name() + "\"}";
+        }
+        return Response.status(mapStatusToHttpResponse(result.status()))
+                .entity(json)
+                .type(MediaType.APPLICATION_JSON)
+                .build();
+    }
+
     /**
      * Processes the PipesResult and returns the metadata list.
      */
     private List<Metadata> processResult(PipesResult result) {
         if (result.isProcessCrash()) {
-            // Process crashed (OOM, timeout, etc.) - return 503
+            // Process crashed (OOM, timeout, unspecified crash) — 503 with 
JSON status body
             LOG.warn("Parse process crashed: {}", result.status());
-            throw new WebApplicationException(
-                    "Parse failed: " + result.status(),
-                    mapStatusToHttpResponse(result.status()));
+            throw new 
WebApplicationException(buildProcessFailureResponse(result));
         }
 
         if (result.isFatal() || result.isInitializationFailure()) {
-            // Fatal or initialization error - return 500
+            // Initialization/fatal error — JSON status body, HTTP status per 
mapStatusToHttpResponse
+            // (500, or 503 for CLIENT_UNAVAILABLE_WITHIN_MS)
             LOG.error("Parse initialization/fatal error: {} - {}",
                     result.status(), result.message());

Review Comment:
   The comment says initialization failures are always "500", but 
`mapStatusToHttpResponse` maps some initialization failures (e.g., 
`CLIENT_UNAVAILABLE_WITHIN_MS`) to `503`. Please reword the comment so it 
matches the actual status mapping.



##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +187,54 @@ private String getSuffix(Metadata metadata) {
         return ".tmp";
     }
 
+    /**
+     * Builds a JSON error response whose shape matches PipesResult 
serialization:
+     * {@code {"status": "TIMEOUT", "message": "..."}}
+     * <p>
+     * This allows clients to distinguish failure modes (TIMEOUT, OOM, 
UNSPECIFIED_CRASH, …)
+     * without parsing plain-text bodies or inspecting custom headers.
+     */

Review Comment:
   This Javadoc says the response shape "matches PipesResult serialization", 
but the implementation returns only a subset of fields (status + optional 
message) and omits `emitData` (and omits `message` when blank). Please adjust 
the wording to avoid misleading API/documentation for clients.





> Improve msg on oom/timeout in tika-server's /tika/json endpoint
> ---------------------------------------------------------------
>
>                 Key: TIKA-4753
>                 URL: https://issues.apache.org/jira/browse/TIKA-4753
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to