[ 
https://issues.apache.org/jira/browse/TIKA-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079023#comment-18079023
 ] 

Nicholas DiPiazza commented on TIKA-4723:
-----------------------------------------

Branch TIKA-4723 has been pushed with the following fixes:

h2. Changes Made

h3. 1. tika-grpc: Assembly ZIP no longer attached to Maven artifact

The {{maven-assembly-plugin}} and {{copy-dependencies}} executions have been 
moved to a new {{docker}} Maven profile with {{<attach>false</attach>}}.

* Default build ({{mvn package}}): produces only the thin JAR (~238KB). Nothing 
large is uploaded to Nexus.
* Docker build ({{mvn package -Pdocker}}): produces the full distribution ZIP 
in {{target/}}, but it is *not* attached/deployed.
* The CI workflow in {{docker-snapshot.yml}} already runs 
{{dependency:copy-dependencies}} separately and does not use the assembly ZIP, 
so no CI changes are needed.

h3. 2. All 15 tika-pipes-plugins: ZIPs no longer attached to Maven artifact

Added {{<attach>false</attach>}} to the {{maven-assembly-plugin}} in all 15 
plugin POMs (s3, gcs, az-blob, kafka, solr, etc.).

The plugin ZIPs are still built during {{mvn package}} (so the Docker build 
script can copy them), but they will no longer be deployed to Nexus/Maven 
Central.

h3. 3. tika-serialization: lombok scope fixed

Changed lombok from {{<scope>compile</scope>}} to {{<scope>provided</scope>}} — 
it is an annotation processor and should never be a transitive runtime 
dependency.

h3. 4. tika-grpc README updated

Added a "Distribution and Maven Artifact" section documenting that tika-grpc is 
Docker-first and that the distribution ZIP is only built with {{-Pdocker}}.

h2. Result

After these changes, a {{mvn deploy}} of tika-grpc will upload:
* One thin JAR (~238KB) — instead of a 400MB ZIP

Branch: https://github.com/apache/tika/tree/TIKA-4723

> Slim down grpc?
> ---------------
>
>                 Key: TIKA-4723
>                 URL: https://issues.apache.org/jira/browse/TIKA-4723
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> For 4.0.0-beta, we should figure out if we can slim down tika-grpc mostly 
> just for environmental reasons. It currently weighs in at 648MB.
> If we said we only support it in Docker, we could strip out some native libs.
> Other options? Claude, copilot and/or gemini, please help us save the 
> environment!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to