[
https://issues.apache.org/jira/browse/TIKA-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080432#comment-18080432
]
ASF GitHub Bot commented on TIKA-4723:
--------------------------------------
nddipiazza opened a new pull request, #2810:
URL: https://github.com/apache/tika/pull/2810
## Summary
Follow-up fixes for
[TIKA-4723](https://issues.apache.org/jira/browse/TIKA-4723) (merged via #2809).
## Changes
### 1. `tika-parser-sqlite3-package/pom.xml` — align shade filter with
sister packages
The `maven-shade-plugin` filter in `tika-parser-sqlite3-package` was missing
three exclusions present in both `tika-parser-scientific-package` and
`tika-parser-nlp-package`:
- `module-info.class` — without this exclusion, shading multiple deps that
each carry a `module-info.class` causes a duplicate-entry error in the shaded
jar on Java 9+.
- `META-INF/LICENSE.md` — duplicate clutter; the
`ApacheLicenseResourceTransformer` already handles the text-format `LICENSE`.
- `META-INF/NOTICE.md` — same rationale as `LICENSE.md`.
### 2.
`docs/modules/ROOT/pages/maintainers/release-guides/release-artifacts.adoc` —
fix incorrect TikaConfigException claim
The doc said:
> _tika-grpc requires at least one pf4j plugin to be loaded at startup; an
empty `plugins/` directory triggers a `TikaConfigException` with a download URL
pointing at Apache dist._
This is factually wrong. `TikaGrpcServerImpl` (line 133) logs a `LOG.warn`
when `pluginManager.getPlugins().isEmpty()` — it does **not** throw a
`TikaConfigException`. The server continues to start; fetcher-dependent RPC
calls simply fail at runtime. Corrected the description to match the actual
code path.
## Review Focus Areas
- `tika-parser-sqlite3-package/pom.xml` shade `<filters>` block — confirm
the three new exclusions are correct and complete.
- `release-artifacts.adoc` paragraph about empty plugins — confirm the new
wording accurately reflects startup behaviour.
## Critical Files
- `tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package/pom.xml`
- `docs/modules/ROOT/pages/maintainers/release-guides/release-artifacts.adoc`
## Testing Instructions
```bash
# Verify the sqlite3 shaded jar builds without duplicate module-info errors
mvn package -pl
tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package -am -DskipTests
# Confirm shaded jar exists and no module-info duplication
jar tf
tika-parsers/tika-parsers-extended/tika-parser-sqlite3-package/target/tika-parser-sqlite3-package-*-shaded.jar
\
| grep -c module-info # should be 0
```
## Review Checklist
- [ ] sqlite3 shade filter exclusions match scientific and nlp packages
- [ ] Docs accurately describe tika-grpc startup behaviour when no plugins
loaded
> Slim down grpc?
> ---------------
>
> Key: TIKA-4723
> URL: https://issues.apache.org/jira/browse/TIKA-4723
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> For 4.0.0-beta, we should figure out if we can slim down tika-grpc mostly
> just for environmental reasons. It currently weighs in at 648MB.
> If we said we only support it in Docker, we could strip out some native libs.
> Other options? Claude, copilot and/or gemini, please help us save the
> environment!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)