[ 
https://issues.apache.org/jira/browse/SOLR-18037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048783#comment-18048783
 ] 

Jan Høydahl commented on SOLR-18037:
------------------------------------

Proposed mailing list announcement after the 9.11 release:
{code:java}
To: [email protected], [email protected]
Subject: [ANNOUNCEMENT] Breaking Change in Solr 9.11: Local Tika Extraction 
Backend Removed

The Solr project normally maintains strict backwards compatibility within
a major version. However, Solr 9.11 includes an exception: the removal of
the embedded "local" Tika extraction backend from the extraction module.

WHY THIS CHANGE WAS NECESSARY
-----------------------------
Tika 1.28 has been end-of-life since September 2022. Its aging dependencies
continue to produce CVEs on a near-weekly basis. Rather than continue
shipping known vulnerabilities, the project decided to remove the local
backend entirely.

WHAT THIS MEANS FOR YOU
-----------------------
If you use the extraction module's ExtractingRequestHandler (also known as
Solr Cell) with the default local Tika backend, you must take action before
upgrading to 9.11:

1. Deploy a Tika Server instance (see https://tika.apache.org/download.html)

2. Configure Solr to use Tika Server by setting the `tikaserver.url`
   parameter in your ExtractingRequestHandler configuration:

   <requestHandler name="/update/extract" 
                   class="solr.extraction.ExtractingRequestHandler">
     <str name="tikaserver.url">http://your-tika-server:9998</str>
     <!-- other configuration -->
   </requestHandler>

3. Migrate any parseContext.config settings: The previous parse-context-based
   configuration is no longer supported. Tika parser-specific properties must
   now be configured directly on the Tika Server itself. See the Tika Server
   documentation for details.

For complete documentation on configuring the tikaserver backend, see:
https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html

IF YOU DON'T USE THE EXTRACTION MODULE
--------------------------------------
This change does not affect you. You can upgrade to 9.11 without any action.

TIMELINE
--------
- Solr 9.10: Tika Server backend added, LocalTikaExtractionBackend deprecated
- Solr 9.11: LocalTikaExtractionBackend removed

FURTHER READING
---------------
- Indexing with Tika: 
https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html
- Upgrade Notes: 
https://solr.apache.org/guide/solr/9_11/upgrade-notes/major-changes-in-solr-9.html#solr-911
- Tika Server Documentation: https://tika.apache.org/
- SOLR-17961: https://issues.apache.org/jira/browse/SOLR-17961
- SOLR-18037: https://issues.apache.org/jira/browse/SOLR-18037

We understand this break from our normal compatibility promise may cause
inconvenience. However, continuing to ship software with known security
vulnerabilities was not acceptable. The Tika Server architecture also
provides better isolation and resource management for document extraction
workloads.

Thank you for your understanding.

- The Apache Solr Team
{code}
Proposed web site news:
{code:java}
Title: Breaking Change in Solr 9.11: Local Tika Removed for Security
category: solr/news

Solr 9.11 removes the embedded "local" Tika extraction backend from the 
**extraction module**. This is an intentional exception to our normal 
backwards compatibility policy, driven by security concerns.

### Background

Tika 1.28, which powered the local extraction backend, reached end-of-life 
in September 2022. Its dependencies continue to generate CVEs regularly, 
and the project determined that shipping known vulnerabilities was 
unacceptable.

### Action Required

If you use the extraction module's ExtractingRequestHandler (Solr Cell), 
you must deploy a [Tika Server](https://tika.apache.org/) and configure 
the `tikaserver.url` parameter before upgrading to 9.11.

Users who added Tika Server support in 9.10 (when it was introduced) are 
already prepared and need no additional changes.

**Note:** The `parseContext.config` option is no longer supported. 
Parser-specific configuration must now be set on the Tika Server itself.

### More Information

- [Indexing with 
Tika](/guide/solr/latest/indexing-guide/indexing-with-tika.html) — Full 
documentation on configuring the tikaserver backend
- [Upgrade Notes for Solr 
9.11](/guide/solr/9_11/upgrade-notes/major-changes-in-solr-9.html#solr-911)
- [SOLR-18037](https://issues.apache.org/jira/browse/SOLR-18037)
- [Tika Server Documentation](https://tika.apache.org/)
{code}

> Remove "local" tika extraction backend from branch_9x
> -----------------------------------------------------
>
>                 Key: SOLR-18037
>                 URL: https://issues.apache.org/jira/browse/SOLR-18037
>             Project: Solr
>          Issue Type: Task
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 9.11
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See DISCUSS thread 
> [https://lists.apache.org/thread/9669dm2bghgqy1xtk8l8jyvkc81oh3sv] 
> This can be done by back-porting this commit 
> [https://github.com/apache/solr/commit/d8546d4179db344e75e6abb39e8d791dbf50116e]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to