[
https://issues.apache.org/jira/browse/TIKA-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063203#comment-18063203
]
Nicholas DiPiazza commented on TIKA-4679:
-----------------------------------------
Hi [~lawrencem], you are correct - the tika-grpc server does not currently
support unpack/all (extracting embedded documents from container formats like
EML/PPTX/ZIP).
I have opened [TIKA-4680|https://issues.apache.org/jira/browse/TIKA-4680] to
track adding a new streaming RPC to tika-grpc:
{code:proto}
rpc Unpack(FetchAndParseRequest) returns (stream UnpackReply) {}
{code}
...where each embedded document is streamed back as a separate reply message
with its raw bytes + metadata, mirroring what {code}/unpack/all{code} does in
the REST server.
In the meantime, since this PR (TIKA-4679) adds HTTP/2 (h2c) support to the
REST server, {code}PUT /unpack/all{code} will work over HTTP/2 once this merges
- so that may be a viable path for you right now if you are using the REST
server rather than gRPC.
Would you be interested in contributing to TIKA-4680? You clearly have the use
case well understood.
> Tika Server HTTP2 Support
> -------------------------
>
> Key: TIKA-4679
> URL: https://issues.apache.org/jira/browse/TIKA-4679
> Project: Tika
> Issue Type: Improvement
> Components: server
> Reporter: Lawrence Moorehead
> Assignee: Nicholas DiPiazza
> Priority: Minor
>
> It would be helpful to have HTTP2 support (particularly clear text 'h2c') for
> Tika Server.
> The main motivation is that Google Cloud Run limits request sizes to 32 mb on
> HTTP1.1, but has no hard cap with HTTP2. (Containers inside Google Cloud Run
> run without HTTPS.)
> The CXF documentation is here for reference:
> [https://cwiki.apache.org/confluence/display/CXF20DOC/Jetty+Configuration#JettyConfiguration-jetty_http2HTTP/2support]
> The main change needed is adding the dependencies that the underlying Jetty
> server needs for http2, {{http2-server}} and {{{}jetty-alpn-java-server{}}},
> to {{{}tika-server-core{}}}.
> The documentation also says there's an
> {{HttpServerEngineSupport#ENABLE_HTTP2}} property that could be used to
> control if it's enabled, but it seems to be enabled by default already and
> I'm not sure it's necessary for users to be able to explicitly disable http2
> support.
> I made a basic smoke test for http2 support here for reference (although this
> doesn't include the alpn library that seems to be necessary for https
> support):
> [https://github.com/elemdisc/tika/pull/1/changes/5b467d1636a123d740ccc2e8d37de8c042959bef]
> You can also check http2 connections with curl:
> {code:java}
> curl --http2-prior-knowledge -v http://localhost:9998/
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)