[ 
https://issues.apache.org/jira/browse/TIKA-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062766#comment-18062766
 ] 

ASF GitHub Bot commented on TIKA-4679:
--------------------------------------

nddipiazza opened a new pull request, #2672:
URL: https://github.com/apache/tika/pull/2672

   ## Summary
   
   Adds HTTP/2 (h2c cleartext) support to tika-server by including the 
`org.eclipse.jetty.http2:http2-server` jar on the classpath. When this jar is 
present, CXF's Jetty transport automatically negotiates HTTP/2 alongside 
HTTP/1.1 on the existing port (default 9998). Existing HTTP/1.1 clients are 
completely unaffected.
   
   This implements 
[TIKA-4679](https://issues.apache.org/jira/browse/TIKA-4679). The core 
dependency change was originally contributed by Lawrence Moorehead 
([@elemdisc](https://github.com/elemdisc)) — see [elemdisc/tika 
PR#1](https://github.com/elemdisc/tika/pull/1) — and is cherry-picked here with 
full author credit.
   
   ---
   
   ## Changes
   
   ### tika-parent/pom.xml
   - Added `http2-server` to the dependency management block alongside the 
existing `http2-hpack`, `http2-client`, `http2-common` entries (all at 
`${jetty.http2.version}`)
   
   ### tika-server/tika-server-core/pom.xml _(Lawrence Moorehead's commit)_
   - Added `org.eclipse.jetty.http2:http2-server` runtime dependency (version 
from parent BOM)
   
   ### tika-server/tika-server-core/src/test/.../TikaServerIntegrationTest.java 
_(Lawrence Moorehead's commit)_
   - Added `testH2c()` unit test that sends a request via 
`HttpClient.Version.HTTP_2` and asserts the response was served over HTTP/2
   
   ### tika-e2e-tests/tika-server/ _(new module)_
   - New e2e module that starts the actual fat-jar process and validates HTTP/2 
(h2c) end-to-end
   - Tests are skipped by default; run with `-Pe2e`
   - Wired into `tika-e2e-tests/pom.xml`
   
   ---
   
   ## How it works
   
   Adding `http2-server` to the classpath is sufficient for h2c (HTTP/2 
cleartext) support. CXF's `JettyHTTPServerEngineFactory` detects the jar at 
startup and wires in `HTTP2CServerConnectionFactory`. No startup code changes 
are required.
   
   For h2 over TLS (recommended for production), configure `TlsConfig` in 
`tika-server.json`. Java 17's built-in ALPN handles protocol negotiation 
automatically — no separate ALPN agent is needed.
   
   ---
   
   ## Port management
   
   - Single port (9998 by default) continues to serve both HTTP/1.1 and HTTP/2
   - No second port added; Docker `EXPOSE 9998` and health-check are unchanged
   - The fat-jar grows by ~500 KB from the new jar
   
   ---
   
   ## Shutdown note
   
   HTTP/2 multiplexes multiple requests over a single TCP connection. The 
current `shutdownNow()` path does not send a GOAWAY frame before closing. Under 
moderate load this is acceptable for h2c, but a future improvement could add a 
drain timeout for graceful HTTP/2 shutdown.
   
   ---
   
   ## Backward compatibility
   
   Purely additive classpath change:
   - Does **not** change the default port
   - Does **not** require TLS (TLS remains opt-in)
   - Does **not** break any existing HTTP/1.1 client
   - Does **not** change the REST API surface
   
   ---
   
   ## Testing Instructions
   
   ```bash
   # Unit test (no external process)
   mvn test -pl tika-server/tika-server-core 
-Dtest=TikaServerIntegrationTest#testH2c
   
   # E2E test (requires fat-jar to be built first)
   mvn package -pl tika-server/tika-server-standard -DskipTests
   mvn test -pl tika-e2e-tests/tika-server -Pe2e
   ```
   
   Manually with curl (after starting the server):
   ```bash
   # HTTP/2 cleartext (h2c)
   curl --http2-prior-knowledge http://localhost:9998/tika
   
   # HTTP/1.1 — unchanged behavior
   curl http://localhost:9998/tika
   ```
   
   ---
   
   ## Review Checklist
   
   - [ ] `http2-server` version comes from `${jetty.http2.version}` in parent 
BOM (not hardcoded)
   - [ ] Existing HTTP/1.1 tests still pass
   - [ ] `TikaServerIntegrationTest#testH2c` passes
   - [ ] E2E module compiles and tests pass with `-Pe2e`
   - [ ] No second port introduced
   
   ---
   
   ## Potential Concerns
   
   - **h2c vs h2**: This PR enables h2c (cleartext). For h2 over TLS an 
additional `jetty-alpn-java-server` dependency may be needed depending on the 
Jetty version and JVM. This can be addressed in a follow-up.
   - **Reverse proxies**: Most reverse proxies (nginx, AWS ALB, GCP LB) do not 
support h2c — they require h2 over TLS. For internal service-to-service use h2c 
is fine; for edge deployments, TLS is recommended.
   - **Fat-jar size**: The `http2-server` jar adds ~500 KB to 
`tika-server-standard`. This also increases the `apache/tika` Docker image 
slightly.




> Tika Server HTTP2 Support
> -------------------------
>
>                 Key: TIKA-4679
>                 URL: https://issues.apache.org/jira/browse/TIKA-4679
>             Project: Tika
>          Issue Type: Improvement
>          Components: server
>            Reporter: Lawrence Moorehead
>            Priority: Minor
>
> It would be helpful to have HTTP2 support (particularly clear text 'h2c') for 
> Tika Server.
> The main motivation is that Google Cloud Run limits request sizes to 32 mb on 
> HTTP1.1, but has no hard cap with HTTP2. (Containers inside Google Cloud Run 
> run without HTTPS.)
> The CXF documentation is here for reference: 
> [https://cwiki.apache.org/confluence/display/CXF20DOC/Jetty+Configuration#JettyConfiguration-jetty_http2HTTP/2support]
> The main change needed is adding the dependencies that the underlying Jetty 
> server needs for http2, {{http2-server}} and {{{}jetty-alpn-java-server{}}}, 
> to {{{}tika-server-core{}}}.
> The documentation also says there's an 
> {{HttpServerEngineSupport#ENABLE_HTTP2}} property that could be used to 
> control if it's enabled, but it seems to be enabled by default already and 
> I'm not sure it's necessary for users to be able to explicitly disable http2 
> support.
> I made a basic smoke test for http2 support here for reference (although this 
> doesn't include the alpn library that seems to be necessary for https 
> support): 
> [https://github.com/elemdisc/tika/pull/1/changes/5b467d1636a123d740ccc2e8d37de8c042959bef]
> You can also check http2 connections with curl: 
> {code:java}
> curl --http2-prior-knowledge -v http://localhost:9998/
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to