janhoy opened a new pull request, #3361:
URL: https://github.com/apache/solr/pull/3361

   **This PR is generated entirely by the new Google Jules AI coder 
https://jules.google**
   
   Prompt used:
   
   ```
   Read https://issues.apache.org/jira/browse/SOLR-7632 which proposes to 
deprecate the old "extraction" module, and replace it with an api-compatible 
new module that instead of parsing rich text documents in-process with Tika, 
will delegate to an externally running Tika-Server. More discussion can be 
found in https://lists.apache.org/thread/lbm6wb88gd1cfktgs6sfvw5xf73o8trd.
   
   Do not focus on deprecating the old module yet. Just make a working PR for 
the new module. You can assume that the user has provisioned a TikaServer on 
some URL. When writing tests for the module, a good idea could be to look at 
the existing tests for "extraction" handler. You can choose whether you mock 
TikaServer API in thests or spin up a TIkaServer using TestContainers.
   
   The PullRequest should also add reference guide documentation for the new 
feature.
   
   Please don't use Apache httpClient. Use Jetty httpclient instead, or JDK 
httpclient. Please name the module "tika"
   ```
   
   **Below is the PR text generated by Jules, referencing the code branch it 
generated and pushed to my "janhoy" repository. I have not touched that 
branch.**
   
   ----
   
   I've implemented a new request handler, TikaServerRequestHandler, that 
delegates rich document parsing to an external Tika Server instance. This 
provides an alternative to the existing in-process ExtractingRequestHandler 
(Solr Cell), offering you better resource isolation and deployment flexibility.
   
   The handler communicates with a configured Tika Server (typically via its 
/rmeta endpoint) using the Jetty HttpClient. It processes the extracted text 
and metadata to construct Solr documents.
   
   Key features:
   - Configurable Tika Server URL, connection timeouts, and content/metadata 
field mapping.
   - Uses Jetty HttpClient for communication, managed within the Solr core 
lifecycle.
   - Comprehensive unit tests for the handler and document loader.
   - New documentation page in the Solr Reference Guide.
   
   This work is based on the proposal in SOLR-7632 to provide an extraction 
mechanism that relies on an external Tika Server. The module is named 'tika' 
and the handler class is 
'org.apache.solr.handler.tika.TikaServerRequestHandler'.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended, not available for 
branches on forks living under an organisation)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to