janhoy commented on code in PR #3670:
URL: https://github.com/apache/solr/pull/3670#discussion_r2413075579
##########
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java:
##########
@@ -75,40 +55,33 @@ public class ExtractingDocumentLoader extends
ContentStreamLoader {
/** Extract Only supported format. Default */
public static final String XML_FORMAT = "xml";
- /** XHTML XPath parser. */
- private static final XPathParser PARSER = new XPathParser("xhtml",
XHTMLContentHandler.XHTML);
-
final SolrCore core;
final SolrParams params;
final UpdateRequestProcessor processor;
final boolean ignoreTikaException;
- protected AutoDetectParser autoDetectParser;
+ final boolean backCompat;
private final AddUpdateCommand templateAdd;
- protected TikaConfig config;
- protected ParseContextConfig parseContextConfig;
protected SolrContentHandlerFactory factory;
+ protected ExtractionBackend backend;
public ExtractingDocumentLoader(
SolrQueryRequest req,
UpdateRequestProcessor processor,
- TikaConfig config,
- ParseContextConfig parseContextConfig,
- SolrContentHandlerFactory factory) {
+ SolrContentHandlerFactory factory,
+ ExtractionBackend backend) {
this.params = req.getParams();
this.core = req.getCore();
- this.config = config;
- this.parseContextConfig = parseContextConfig;
this.processor = processor;
+ this.backCompat = params.getBool(ExtractingParams.BACK_COMPATIBILITY,
true);
Review Comment:
For some reason, this does not work as expected. It will pick up the
back-compat parameter on the update request, but if the same is configured in
the handler definition in solrconfig.xml, it is not part of `req.getParams()`.
Also, we do not have access to `initParams` namedlist here, only in the handler.
This means that currently back-compat is on by default (I believe default
should be off), and can only be modified per request.
I tried to instead read this parameter in `load()`, but even tehre, the
SolrQueryRequest object does not contain config from initParams in solrconfig.
I could do a hack and pass initParams in the ExtractingDocumentLoader
constructor and write code to fall back to reading that, but I was sure that
Solr would handle merging params from init and request automatically. Any
insight?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]