Copilot commented on code in PR #2870:
URL: https://github.com/apache/tika/pull/2870#discussion_r3363328003


##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerIntegrationTest.java:
##########
@@ -212,11 +219,25 @@ public void testTimeout() throws Exception {
 
         // Server should return 503 (Service Unavailable) for timeout
         assertEquals(503, response.getStatus());
+        assertErrorResponseStatus(response, "TIMEOUT");
 
         // Server should still be running - verify with a successful request
         testBaseline();
     }
 
+    /**
+     * Asserts that an error response body is JSON with a {@code status} field 
matching
+     * {@code expectedStatus} (a {@code PipesResult.RESULT_STATUS} enum name).
+     */
+    private void assertErrorResponseStatus(Response response, String 
expectedStatus) throws IOException {
+        try (InputStream is = (InputStream) response.getEntity()) {
+            String body = IOUtils.toString(is, StandardCharsets.UTF_8);
+            JsonNode node = new ObjectMapper().readTree(body);
+            assertEquals(expectedStatus, node.path("status").asText(null),
+                    "Expected JSON error body with status=" + expectedStatus + 
" but got: " + body);
+        }
+    }

Review Comment:
   The new helper method is not indented consistently with the rest of the 
class, and it can reuse the existing static `UTF_8` import rather than 
`StandardCharsets.UTF_8`.



##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerIntegrationTest.java:
##########
@@ -29,12 +29,15 @@
 import java.net.http.HttpClient;
 import java.net.http.HttpRequest;
 import java.net.http.HttpResponse;
+import java.nio.charset.StandardCharsets;

Review Comment:
   After switching the helper to use the existing `UTF_8` static import, this 
`StandardCharsets` import becomes unused and should be removed to avoid 
compiler/linter warnings.



##########
docs/modules/ROOT/pages/using-tika/server/index.adoc:
##########
@@ -156,6 +156,41 @@ curl -T document.pdf 
http://localhost:9998/meta/Content-Type   # single field
 * `/translate/all/\{translator}/\{src}/\{dest}` — translation
 * `/pipes`, `/async` — Pipes-based bulk processing
 
+== Error Responses
+
+When parsing fails due to a process-level problem — the forked child process 
timed out,
+ran out of memory, or crashed unexpectedly — the server returns an HTTP error 
with a
+JSON body whose shape matches the `PipesResult` status:
+
+[source,json]
+----
+{"status": "TIMEOUT", "message": "Task timed out after 60000ms"}
+----
+
+The `status` field is the `PipesResult.RESULT_STATUS` enum name. The `message` 
field is
+present when Tika provided one, absent otherwise.
+
+[cols="1,1,3"]
+|===
+|HTTP status |`status` values |Meaning
+
+|`503 Service Unavailable`
+|`TIMEOUT`, `OOM`, `UNSPECIFIED_CRASH`, `CLIENT_UNAVAILABLE_WITHIN_MS`
+|The forked parse process failed. The server is still healthy; the client may 
retry.

Review Comment:
   The 503 row includes `CLIENT_UNAVAILABLE_WITHIN_MS`, which is not a 
forked-process crash (it means no pipes client was available within the 
configured wait time). The meaning text should be broadened so it accurately 
covers all listed statuses.



##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +187,54 @@ private String getSuffix(Metadata metadata) {
         return ".tmp";
     }
 
+    /**
+     * Builds a JSON error response whose shape matches PipesResult 
serialization:
+     * {@code {"status": "TIMEOUT", "message": "..."}}
+     * <p>
+     * This allows clients to distinguish failure modes (TIMEOUT, OOM, 
UNSPECIFIED_CRASH, …)
+     * without parsing plain-text bodies or inspecting custom headers.
+     */
+    private static Response buildProcessFailureResponse(PipesResult result) {
+        ObjectMapper mapper = new ObjectMapper();
+        ObjectNode node = mapper.createObjectNode();
+        node.put("status", result.status().name());
+        if (result.message() != null && !result.message().isBlank()) {
+            node.put("message", result.message());
+        }
+        String json;
+        try {
+            json = mapper.writeValueAsString(node);
+        } catch (Exception e) {
+            json = "{\"status\":\"" + result.status().name() + "\"}";
+        }
+        return Response.status(mapStatusToHttpResponse(result.status()))
+                .entity(json)
+                .type(MediaType.APPLICATION_JSON)
+                .build();
+    }
+
     /**
      * Processes the PipesResult and returns the metadata list.
      */
     private List<Metadata> processResult(PipesResult result) {
         if (result.isProcessCrash()) {
-            // Process crashed (OOM, timeout, etc.) - return 503
+            // Process crashed (OOM, timeout, unspecified crash) — 503 with 
JSON status body
             LOG.warn("Parse process crashed: {}", result.status());
-            throw new WebApplicationException(
-                    "Parse failed: " + result.status(),
-                    mapStatusToHttpResponse(result.status()));
+            throw new 
WebApplicationException(buildProcessFailureResponse(result));
         }
 
         if (result.isFatal() || result.isInitializationFailure()) {
-            // Fatal or initialization error - return 500
+            // Initialization/fatal error — JSON status body, HTTP status per 
mapStatusToHttpResponse
+            // (500, or 503 for CLIENT_UNAVAILABLE_WITHIN_MS)
             LOG.error("Parse initialization/fatal error: {} - {}",
                     result.status(), result.message());

Review Comment:
   The comment says initialization failures are always "500", but 
`mapStatusToHttpResponse` maps some initialization failures (e.g., 
`CLIENT_UNAVAILABLE_WITHIN_MS`) to `503`. Please reword the comment so it 
matches the actual status mapping.



##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +187,54 @@ private String getSuffix(Metadata metadata) {
         return ".tmp";
     }
 
+    /**
+     * Builds a JSON error response whose shape matches PipesResult 
serialization:
+     * {@code {"status": "TIMEOUT", "message": "..."}}
+     * <p>
+     * This allows clients to distinguish failure modes (TIMEOUT, OOM, 
UNSPECIFIED_CRASH, …)
+     * without parsing plain-text bodies or inspecting custom headers.
+     */

Review Comment:
   This Javadoc says the response shape "matches PipesResult serialization", 
but the implementation returns only a subset of fields (status + optional 
message) and omits `emitData` (and omits `message` when blank). Please adjust 
the wording to avoid misleading API/documentation for clients.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to