[
https://issues.apache.org/jira/browse/TIKA-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062766#comment-18062766
]
ASF GitHub Bot commented on TIKA-4679:
--------------------------------------
nddipiazza opened a new pull request, #2672:
URL: https://github.com/apache/tika/pull/2672
## Summary
Adds HTTP/2 (h2c cleartext) support to tika-server by including the
`org.eclipse.jetty.http2:http2-server` jar on the classpath. When this jar is
present, CXF's Jetty transport automatically negotiates HTTP/2 alongside
HTTP/1.1 on the existing port (default 9998). Existing HTTP/1.1 clients are
completely unaffected.
This implements
[TIKA-4679](https://issues.apache.org/jira/browse/TIKA-4679). The core
dependency change was originally contributed by Lawrence Moorehead
([@elemdisc](https://github.com/elemdisc)) — see [elemdisc/tika
PR#1](https://github.com/elemdisc/tika/pull/1) — and is cherry-picked here with
full author credit.
---
## Changes
### tika-parent/pom.xml
- Added `http2-server` to the dependency management block alongside the
existing `http2-hpack`, `http2-client`, `http2-common` entries (all at
`${jetty.http2.version}`)
### tika-server/tika-server-core/pom.xml _(Lawrence Moorehead's commit)_
- Added `org.eclipse.jetty.http2:http2-server` runtime dependency (version
from parent BOM)
### tika-server/tika-server-core/src/test/.../TikaServerIntegrationTest.java
_(Lawrence Moorehead's commit)_
- Added `testH2c()` unit test that sends a request via
`HttpClient.Version.HTTP_2` and asserts the response was served over HTTP/2
### tika-e2e-tests/tika-server/ _(new module)_
- New e2e module that starts the actual fat-jar process and validates HTTP/2
(h2c) end-to-end
- Tests are skipped by default; run with `-Pe2e`
- Wired into `tika-e2e-tests/pom.xml`
---
## How it works
Adding `http2-server` to the classpath is sufficient for h2c (HTTP/2
cleartext) support. CXF's `JettyHTTPServerEngineFactory` detects the jar at
startup and wires in `HTTP2CServerConnectionFactory`. No startup code changes
are required.
For h2 over TLS (recommended for production), configure `TlsConfig` in
`tika-server.json`. Java 17's built-in ALPN handles protocol negotiation
automatically — no separate ALPN agent is needed.
---
## Port management
- Single port (9998 by default) continues to serve both HTTP/1.1 and HTTP/2
- No second port added; Docker `EXPOSE 9998` and health-check are unchanged
- The fat-jar grows by ~500 KB from the new jar
---
## Shutdown note
HTTP/2 multiplexes multiple requests over a single TCP connection. The
current `shutdownNow()` path does not send a GOAWAY frame before closing. Under
moderate load this is acceptable for h2c, but a future improvement could add a
drain timeout for graceful HTTP/2 shutdown.
---
## Backward compatibility
Purely additive classpath change:
- Does **not** change the default port
- Does **not** require TLS (TLS remains opt-in)
- Does **not** break any existing HTTP/1.1 client
- Does **not** change the REST API surface
---
## Testing Instructions
```bash
# Unit test (no external process)
mvn test -pl tika-server/tika-server-core
-Dtest=TikaServerIntegrationTest#testH2c
# E2E test (requires fat-jar to be built first)
mvn package -pl tika-server/tika-server-standard -DskipTests
mvn test -pl tika-e2e-tests/tika-server -Pe2e
```
Manually with curl (after starting the server):
```bash
# HTTP/2 cleartext (h2c)
curl --http2-prior-knowledge http://localhost:9998/tika
# HTTP/1.1 — unchanged behavior
curl http://localhost:9998/tika
```
---
## Review Checklist
- [ ] `http2-server` version comes from `${jetty.http2.version}` in parent
BOM (not hardcoded)
- [ ] Existing HTTP/1.1 tests still pass
- [ ] `TikaServerIntegrationTest#testH2c` passes
- [ ] E2E module compiles and tests pass with `-Pe2e`
- [ ] No second port introduced
---
## Potential Concerns
- **h2c vs h2**: This PR enables h2c (cleartext). For h2 over TLS an
additional `jetty-alpn-java-server` dependency may be needed depending on the
Jetty version and JVM. This can be addressed in a follow-up.
- **Reverse proxies**: Most reverse proxies (nginx, AWS ALB, GCP LB) do not
support h2c — they require h2 over TLS. For internal service-to-service use h2c
is fine; for edge deployments, TLS is recommended.
- **Fat-jar size**: The `http2-server` jar adds ~500 KB to
`tika-server-standard`. This also increases the `apache/tika` Docker image
slightly.
> Tika Server HTTP2 Support
> -------------------------
>
> Key: TIKA-4679
> URL: https://issues.apache.org/jira/browse/TIKA-4679
> Project: Tika
> Issue Type: Improvement
> Components: server
> Reporter: Lawrence Moorehead
> Priority: Minor
>
> It would be helpful to have HTTP2 support (particularly clear text 'h2c') for
> Tika Server.
> The main motivation is that Google Cloud Run limits request sizes to 32 mb on
> HTTP1.1, but has no hard cap with HTTP2. (Containers inside Google Cloud Run
> run without HTTPS.)
> The CXF documentation is here for reference:
> [https://cwiki.apache.org/confluence/display/CXF20DOC/Jetty+Configuration#JettyConfiguration-jetty_http2HTTP/2support]
> The main change needed is adding the dependencies that the underlying Jetty
> server needs for http2, {{http2-server}} and {{{}jetty-alpn-java-server{}}},
> to {{{}tika-server-core{}}}.
> The documentation also says there's an
> {{HttpServerEngineSupport#ENABLE_HTTP2}} property that could be used to
> control if it's enabled, but it seems to be enabled by default already and
> I'm not sure it's necessary for users to be able to explicitly disable http2
> support.
> I made a basic smoke test for http2 support here for reference (although this
> doesn't include the alpn library that seems to be necessary for https
> support):
> [https://github.com/elemdisc/tika/pull/1/changes/5b467d1636a123d740ccc2e8d37de8c042959bef]
> You can also check http2 connections with curl:
> {code:java}
> curl --http2-prior-knowledge -v http://localhost:9998/
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)