nddipiazza opened a new pull request, #2672:
URL: https://github.com/apache/tika/pull/2672

   ## Summary
   
   Adds HTTP/2 (h2c cleartext) support to tika-server by including the 
`org.eclipse.jetty.http2:http2-server` jar on the classpath. When this jar is 
present, CXF's Jetty transport automatically negotiates HTTP/2 alongside 
HTTP/1.1 on the existing port (default 9998). Existing HTTP/1.1 clients are 
completely unaffected.
   
   This implements 
[TIKA-4679](https://issues.apache.org/jira/browse/TIKA-4679). The core 
dependency change was originally contributed by Lawrence Moorehead 
([@elemdisc](https://github.com/elemdisc)) — see [elemdisc/tika 
PR#1](https://github.com/elemdisc/tika/pull/1) — and is cherry-picked here with 
full author credit.
   
   ---
   
   ## Changes
   
   ### tika-parent/pom.xml
   - Added `http2-server` to the dependency management block alongside the 
existing `http2-hpack`, `http2-client`, `http2-common` entries (all at 
`${jetty.http2.version}`)
   
   ### tika-server/tika-server-core/pom.xml _(Lawrence Moorehead's commit)_
   - Added `org.eclipse.jetty.http2:http2-server` runtime dependency (version 
from parent BOM)
   
   ### tika-server/tika-server-core/src/test/.../TikaServerIntegrationTest.java 
_(Lawrence Moorehead's commit)_
   - Added `testH2c()` unit test that sends a request via 
`HttpClient.Version.HTTP_2` and asserts the response was served over HTTP/2
   
   ### tika-e2e-tests/tika-server/ _(new module)_
   - New e2e module that starts the actual fat-jar process and validates HTTP/2 
(h2c) end-to-end
   - Tests are skipped by default; run with `-Pe2e`
   - Wired into `tika-e2e-tests/pom.xml`
   
   ---
   
   ## How it works
   
   Adding `http2-server` to the classpath is sufficient for h2c (HTTP/2 
cleartext) support. CXF's `JettyHTTPServerEngineFactory` detects the jar at 
startup and wires in `HTTP2CServerConnectionFactory`. No startup code changes 
are required.
   
   For h2 over TLS (recommended for production), configure `TlsConfig` in 
`tika-server.json`. Java 17's built-in ALPN handles protocol negotiation 
automatically — no separate ALPN agent is needed.
   
   ---
   
   ## Port management
   
   - Single port (9998 by default) continues to serve both HTTP/1.1 and HTTP/2
   - No second port added; Docker `EXPOSE 9998` and health-check are unchanged
   - The fat-jar grows by ~500 KB from the new jar
   
   ---
   
   ## Shutdown note
   
   HTTP/2 multiplexes multiple requests over a single TCP connection. The 
current `shutdownNow()` path does not send a GOAWAY frame before closing. Under 
moderate load this is acceptable for h2c, but a future improvement could add a 
drain timeout for graceful HTTP/2 shutdown.
   
   ---
   
   ## Backward compatibility
   
   Purely additive classpath change:
   - Does **not** change the default port
   - Does **not** require TLS (TLS remains opt-in)
   - Does **not** break any existing HTTP/1.1 client
   - Does **not** change the REST API surface
   
   ---
   
   ## Testing Instructions
   
   ```bash
   # Unit test (no external process)
   mvn test -pl tika-server/tika-server-core 
-Dtest=TikaServerIntegrationTest#testH2c
   
   # E2E test (requires fat-jar to be built first)
   mvn package -pl tika-server/tika-server-standard -DskipTests
   mvn test -pl tika-e2e-tests/tika-server -Pe2e
   ```
   
   Manually with curl (after starting the server):
   ```bash
   # HTTP/2 cleartext (h2c)
   curl --http2-prior-knowledge http://localhost:9998/tika
   
   # HTTP/1.1 — unchanged behavior
   curl http://localhost:9998/tika
   ```
   
   ---
   
   ## Review Checklist
   
   - [ ] `http2-server` version comes from `${jetty.http2.version}` in parent 
BOM (not hardcoded)
   - [ ] Existing HTTP/1.1 tests still pass
   - [ ] `TikaServerIntegrationTest#testH2c` passes
   - [ ] E2E module compiles and tests pass with `-Pe2e`
   - [ ] No second port introduced
   
   ---
   
   ## Potential Concerns
   
   - **h2c vs h2**: This PR enables h2c (cleartext). For h2 over TLS an 
additional `jetty-alpn-java-server` dependency may be needed depending on the 
Jetty version and JVM. This can be addressed in a follow-up.
   - **Reverse proxies**: Most reverse proxies (nginx, AWS ALB, GCP LB) do not 
support h2c — they require h2 over TLS. For internal service-to-service use h2c 
is fine; for edge deployments, TLS is recommended.
   - **Fat-jar size**: The `http2-server` jar adds ~500 KB to 
`tika-server-standard`. This also increases the `apache/tika` Docker image 
slightly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to