[ 
https://issues.apache.org/jira/browse/TIKA-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063199#comment-18063199
 ] 

ASF GitHub Bot commented on TIKA-4679:
--------------------------------------

Copilot commented on code in PR #2672:
URL: https://github.com/apache/tika/pull/2672#discussion_r2889591071


##########
tika-e2e-tests/tika-server/src/test/java/org/apache/tika/server/e2e/TikaServerHttp2Test.java:
##########
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.server.e2e;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.io.BufferedReader;
+import java.io.InputStreamReader;
+import java.net.ServerSocket;
+import java.net.URI;
+import java.net.http.HttpClient;
+import java.net.http.HttpRequest;
+import java.net.http.HttpResponse;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.time.Duration;
+import java.time.Instant;
+
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.Assumptions;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * End-to-end test verifying that tika-server-standard supports HTTP/2 (h2c 
cleartext).
+ *
+ * Starts the real fat-jar, sends a request using Java's HttpClient configured 
for HTTP/2,
+ * and asserts the response was served over HTTP/2. This validates the runtime 
classpath
+ * has the Jetty http2-server jar and CXF negotiates h2c correctly.
+ *
+ * Run with: mvn test -pl tika-e2e-tests/tika-server -Pe2e
+ *
+ * Inspired by Lawrence Moorehead's original contribution (elemdisc/tika PR#1, 
TIKA-4679).
+ */
+@Tag("E2ETest")
+public class TikaServerHttp2Test {
+
+    private static final Logger log = 
LoggerFactory.getLogger(TikaServerHttp2Test.class);
+    private static final long SERVER_STARTUP_TIMEOUT_MS = 90_000;
+    private static final String STATUS_PATH = "/status";
+
+    private Process serverProcess;
+    private int port;
+    private String endPoint;
+
+    @BeforeEach
+    void startServer() throws Exception {
+        port = findFreePort();
+        endPoint = "http://localhost:"; + port;
+
+        String jarPath = System.getProperty("tika.server.jar");
+        if (jarPath == null) {
+            // fall back to conventional location relative to this module
+            Path moduleDir = Paths.get("").toAbsolutePath();
+            Path repoRoot = moduleDir;
+            while (repoRoot != null && 
!repoRoot.resolve("tika-server").toFile().isDirectory()) {
+                repoRoot = repoRoot.getParent();
+            }
+            if (repoRoot == null) {
+                throw new IllegalStateException("Cannot locate tika root. Pass 
-Dtika.server.jar=/path/to/tika-server-standard.jar");
+            }
+            jarPath = 
repoRoot.resolve("tika-server/tika-server-standard/target")
+                    .toAbsolutePath()
+                    .toString() + "/tika-server-standard-" +
+                    System.getProperty("tika.version", "4.0.0-SNAPSHOT") + 
".jar";
+        }
+
+        Path jar = Paths.get(jarPath);
+        Assumptions.assumeTrue(Files.exists(jar),
+                "Fat-jar not found at " + jarPath + "; skipping HTTP/2 e2e 
test. " +
+                "Build with: mvn package -pl tika-server/tika-server-standard 
-DskipTests");
+
+        log.info("Starting tika-server-standard from: {}", jarPath);
+        ProcessBuilder pb = new ProcessBuilder(
+                "java", "-jar", jarPath,
+                "-p", String.valueOf(port),
+                "-h", "localhost"
+        );
+        pb.redirectErrorStream(true);
+        serverProcess = pb.start();
+
+        // Drain output in background so the process doesn't block
+        Thread drainThread = new Thread(() -> {
+            try (BufferedReader reader = new BufferedReader(
+                    new InputStreamReader(serverProcess.getInputStream(), 
UTF_8))) {
+                String line;
+                while ((line = reader.readLine()) != null) {
+                    log.debug("tika-server: {}", line);
+                }
+            } catch (Exception e) {
+                log.debug("Server output stream closed", e);
+            }
+        });
+        drainThread.setDaemon(true);
+        drainThread.start();
+
+        awaitServerStartup();
+    }
+

Review Comment:
   The `stopServer()` method calls `serverProcess.waitFor()` without a timeout. 
If the server process doesn't terminate after `destroy()` (SIGTERM), this will 
block the test indefinitely. The existing `IntegrationTestBase.tearDown()` uses 
a safer pattern: `waitFor(5, TimeUnit.SECONDS)` followed by `destroyForcibly()` 
and then `waitFor(30, TimeUnit.SECONDS)`. Consider adopting the same approach 
here to prevent hanging CI builds.



##########
tika-e2e-tests/tika-server/src/test/java/org/apache/tika/server/e2e/TikaServerHttp2Test.java:
##########
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.server.e2e;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.io.BufferedReader;
+import java.io.InputStreamReader;
+import java.net.ServerSocket;
+import java.net.URI;
+import java.net.http.HttpClient;
+import java.net.http.HttpRequest;
+import java.net.http.HttpResponse;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.time.Duration;
+import java.time.Instant;
+
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.Assumptions;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * End-to-end test verifying that tika-server-standard supports HTTP/2 (h2c 
cleartext).
+ *
+ * Starts the real fat-jar, sends a request using Java's HttpClient configured 
for HTTP/2,
+ * and asserts the response was served over HTTP/2. This validates the runtime 
classpath
+ * has the Jetty http2-server jar and CXF negotiates h2c correctly.
+ *
+ * Run with: mvn test -pl tika-e2e-tests/tika-server -Pe2e
+ *
+ * Inspired by Lawrence Moorehead's original contribution (elemdisc/tika PR#1, 
TIKA-4679).
+ */
+@Tag("E2ETest")
+public class TikaServerHttp2Test {
+
+    private static final Logger log = 
LoggerFactory.getLogger(TikaServerHttp2Test.class);
+    private static final long SERVER_STARTUP_TIMEOUT_MS = 90_000;
+    private static final String STATUS_PATH = "/status";
+
+    private Process serverProcess;
+    private int port;
+    private String endPoint;
+
+    @BeforeEach
+    void startServer() throws Exception {
+        port = findFreePort();
+        endPoint = "http://localhost:"; + port;
+
+        String jarPath = System.getProperty("tika.server.jar");
+        if (jarPath == null) {
+            // fall back to conventional location relative to this module
+            Path moduleDir = Paths.get("").toAbsolutePath();
+            Path repoRoot = moduleDir;

Review Comment:
   The variable `moduleDir` is assigned on line 71 but never used — `repoRoot` 
is immediately initialized to the same value on line 72. You can remove 
`moduleDir` and initialize `repoRoot` directly: `Path repoRoot = 
Paths.get("").toAbsolutePath();`.
   ```suggestion
               Path repoRoot = Paths.get("").toAbsolutePath();
   ```





> Tika Server HTTP2 Support
> -------------------------
>
>                 Key: TIKA-4679
>                 URL: https://issues.apache.org/jira/browse/TIKA-4679
>             Project: Tika
>          Issue Type: Improvement
>          Components: server
>            Reporter: Lawrence Moorehead
>            Assignee: Nicholas DiPiazza
>            Priority: Minor
>
> It would be helpful to have HTTP2 support (particularly clear text 'h2c') for 
> Tika Server.
> The main motivation is that Google Cloud Run limits request sizes to 32 mb on 
> HTTP1.1, but has no hard cap with HTTP2. (Containers inside Google Cloud Run 
> run without HTTPS.)
> The CXF documentation is here for reference: 
> [https://cwiki.apache.org/confluence/display/CXF20DOC/Jetty+Configuration#JettyConfiguration-jetty_http2HTTP/2support]
> The main change needed is adding the dependencies that the underlying Jetty 
> server needs for http2, {{http2-server}} and {{{}jetty-alpn-java-server{}}}, 
> to {{{}tika-server-core{}}}.
> The documentation also says there's an 
> {{HttpServerEngineSupport#ENABLE_HTTP2}} property that could be used to 
> control if it's enabled, but it seems to be enabled by default already and 
> I'm not sure it's necessary for users to be able to explicitly disable http2 
> support.
> I made a basic smoke test for http2 support here for reference (although this 
> doesn't include the alpn library that seems to be necessary for https 
> support): 
> [https://github.com/elemdisc/tika/pull/1/changes/5b467d1636a123d740ccc2e8d37de8c042959bef]
> You can also check http2 connections with curl: 
> {code:java}
> curl --http2-prior-knowledge -v http://localhost:9998/
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to