Hi, Raising the awareness of a topic that was suggested some 10 years ago (See SOLR-7632 <https://issues.apache.org/jira/browse/SOLR-7632>), and that may finally happen. It's about evolving our Extraction module to use TikaServer intead of local in-process Tika jars.
In Solr 9.x we have Tika 1.x jars, which is end of life. It is also an anti-pattern to process huge PDFs in Solr's JVM process. So in PR #3670 <https://github.com/apache/solr/pull/3670> I added the concept of Extraction Backends to the ExtractingRequestHandler, adding TikaServer as a new backend. I'd really like to get rid of the weight of Tika jar dependencies in 10.0, which is soon to start release phase. Switching to TikaServer in Solr 10 can make that happen. The PR is fairly mature, but needs more eyes before merge. - Please voice your support for the approach - More eyes on the Pull Request - Test the PR branch on your own data (same API, just add extraction.backend and tikaserver.url to your RH config) Jan
