[ 
https://issues.apache.org/jira/browse/TIKA-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080432#comment-18080432
 ] 

ASF GitHub Bot commented on TIKA-4723:
--------------------------------------

nddipiazza opened a new pull request, #2810:
URL: https://github.com/apache/tika/pull/2810

   ## Summary
   
   Follow-up fixes for 
[TIKA-4723](https://issues.apache.org/jira/browse/TIKA-4723) (merged via #2809).
   
   ## Changes
   
   ### 1. `tika-parser-sqlite3-package/pom.xml` — align shade filter with 
sister packages
   
   The `maven-shade-plugin` filter in `tika-parser-sqlite3-package` was missing 
three exclusions present in both `tika-parser-scientific-package` and 
`tika-parser-nlp-package`:
   
   - `module-info.class` — without this exclusion, shading multiple deps that 
each carry a `module-info.class` causes a duplicate-entry error in the shaded 
jar on Java 9+.
   - `META-INF/LICENSE.md` — duplicate clutter; the 
`ApacheLicenseResourceTransformer` already handles the text-format `LICENSE`.
   - `META-INF/NOTICE.md` — same rationale as `LICENSE.md`.
   
   ### 2. 
`docs/modules/ROOT/pages/maintainers/release-guides/release-artifacts.adoc` — 
fix incorrect TikaConfigException claim
   
   The doc said:
   
   > _tika-grpc requires at least one pf4j plugin to be loaded at startup; an 
empty `plugins/` directory triggers a `TikaConfigException` with a download URL 
pointing at Apache dist._
   
   This is factually wrong. `TikaGrpcServerImpl` (line 133) logs a `LOG.warn` 
when `pluginManager.getPlugins().isEmpty()` — it does **not** throw a 
`TikaConfigException`. The server continues to start; fetcher-dependent RPC 
calls simply fail at runtime. Corrected the description to match the actual 
code path.
   
   ## Review Focus Areas
   
   - `tika-parser-sqlite3-package/pom.xml` shade `<filters>` block — confirm 
the three new exclusions are correct and complete.
   - `release-artifacts.adoc` paragraph about empty plugins — confirm the new 
wording accurately reflects startup behaviour.
   
   ## Critical Files
   
   - `tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package/pom.xml`
   - `docs/modules/ROOT/pages/maintainers/release-guides/release-artifacts.adoc`
   
   ## Testing Instructions
   
   ```bash
   # Verify the sqlite3 shaded jar builds without duplicate module-info errors
   mvn package -pl 
tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package -am -DskipTests
   
   # Confirm shaded jar exists and no module-info duplication
   jar tf 
tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package/target/tika-parser-sqlite3-package-*-shaded.jar
 \
     | grep -c module-info   # should be 0
   ```
   
   ## Review Checklist
   
   - [ ] sqlite3 shade filter exclusions match scientific and nlp packages
   - [ ] Docs accurately describe tika-grpc startup behaviour when no plugins 
loaded




> Slim down grpc?
> ---------------
>
>                 Key: TIKA-4723
>                 URL: https://issues.apache.org/jira/browse/TIKA-4723
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> For 4.0.0-beta, we should figure out if we can slim down tika-grpc mostly 
> just for environmental reasons. It currently weighs in at 648MB.
> If we said we only support it in Docker, we could strip out some native libs.
> Other options? Claude, copilot and/or gemini, please help us save the 
> environment!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to