[
https://issues.apache.org/jira/browse/SOLR-18037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048783#comment-18048783
]
Jan Høydahl commented on SOLR-18037:
------------------------------------
Proposed mailing list announcement after the 9.11 release:
{code:java}
To: [email protected], [email protected]
Subject: [ANNOUNCEMENT] Breaking Change in Solr 9.11: Local Tika Extraction
Backend Removed
The Solr project normally maintains strict backwards compatibility within
a major version. However, Solr 9.11 includes an exception: the removal of
the embedded "local" Tika extraction backend from the extraction module.
WHY THIS CHANGE WAS NECESSARY
-----------------------------
Tika 1.28 has been end-of-life since September 2022. Its aging dependencies
continue to produce CVEs on a near-weekly basis. Rather than continue
shipping known vulnerabilities, the project decided to remove the local
backend entirely.
WHAT THIS MEANS FOR YOU
-----------------------
If you use the extraction module's ExtractingRequestHandler (also known as
Solr Cell) with the default local Tika backend, you must take action before
upgrading to 9.11:
1. Deploy a Tika Server instance (see https://tika.apache.org/download.html)
2. Configure Solr to use Tika Server by setting the `tikaserver.url`
parameter in your ExtractingRequestHandler configuration:
<requestHandler name="/update/extract"
class="solr.extraction.ExtractingRequestHandler">
<str name="tikaserver.url">http://your-tika-server:9998</str>
<!-- other configuration -->
</requestHandler>
3. Migrate any parseContext.config settings: The previous parse-context-based
configuration is no longer supported. Tika parser-specific properties must
now be configured directly on the Tika Server itself. See the Tika Server
documentation for details.
For complete documentation on configuring the tikaserver backend, see:
https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html
IF YOU DON'T USE THE EXTRACTION MODULE
--------------------------------------
This change does not affect you. You can upgrade to 9.11 without any action.
TIMELINE
--------
- Solr 9.10: Tika Server backend added, LocalTikaExtractionBackend deprecated
- Solr 9.11: LocalTikaExtractionBackend removed
FURTHER READING
---------------
- Indexing with Tika:
https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html
- Upgrade Notes:
https://solr.apache.org/guide/solr/9_11/upgrade-notes/major-changes-in-solr-9.html#solr-911
- Tika Server Documentation: https://tika.apache.org/
- SOLR-17961: https://issues.apache.org/jira/browse/SOLR-17961
- SOLR-18037: https://issues.apache.org/jira/browse/SOLR-18037
We understand this break from our normal compatibility promise may cause
inconvenience. However, continuing to ship software with known security
vulnerabilities was not acceptable. The Tika Server architecture also
provides better isolation and resource management for document extraction
workloads.
Thank you for your understanding.
- The Apache Solr Team
{code}
Proposed web site news:
{code:java}
Title: Breaking Change in Solr 9.11: Local Tika Removed for Security
category: solr/news
Solr 9.11 removes the embedded "local" Tika extraction backend from the
**extraction module**. This is an intentional exception to our normal
backwards compatibility policy, driven by security concerns.
### Background
Tika 1.28, which powered the local extraction backend, reached end-of-life
in September 2022. Its dependencies continue to generate CVEs regularly,
and the project determined that shipping known vulnerabilities was
unacceptable.
### Action Required
If you use the extraction module's ExtractingRequestHandler (Solr Cell),
you must deploy a [Tika Server](https://tika.apache.org/) and configure
the `tikaserver.url` parameter before upgrading to 9.11.
Users who added Tika Server support in 9.10 (when it was introduced) are
already prepared and need no additional changes.
**Note:** The `parseContext.config` option is no longer supported.
Parser-specific configuration must now be set on the Tika Server itself.
### More Information
- [Indexing with
Tika](/guide/solr/latest/indexing-guide/indexing-with-tika.html) — Full
documentation on configuring the tikaserver backend
- [Upgrade Notes for Solr
9.11](/guide/solr/9_11/upgrade-notes/major-changes-in-solr-9.html#solr-911)
- [SOLR-18037](https://issues.apache.org/jira/browse/SOLR-18037)
- [Tika Server Documentation](https://tika.apache.org/)
{code}
> Remove "local" tika extraction backend from branch_9x
> -----------------------------------------------------
>
> Key: SOLR-18037
> URL: https://issues.apache.org/jira/browse/SOLR-18037
> Project: Solr
> Issue Type: Task
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Jan Høydahl
> Assignee: Jan Høydahl
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 9.11
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> See DISCUSS thread
> [https://lists.apache.org/thread/9669dm2bghgqy1xtk8l8jyvkc81oh3sv]
> This can be done by back-porting this commit
> [https://github.com/apache/solr/commit/d8546d4179db344e75e6abb39e8d791dbf50116e]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]