[ https://issues.apache.org/jira/browse/SOLR-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128413#comment-13128413 ]
Jan Høydahl commented on SOLR-2842: ----------------------------------- Yep, for distrib cloud stuff it would be cool to be able to have dedicated doc processor nodes. I don't think the client necessarily needs to be THAT fat or complex if this is done right. If we make the UpdateChain and the Processor itself more stand-alone, not depending on SolrCore, and make updateChains easily configurable outside of solrconfig.xml (see SOLR-2841), then it would be straight-forward to instansiate a chain on the client side, without the RunUpdateProcessor of course. Some processors use Schema, so we'd perhaps need a way to fetch the correct schema from the server, using admin/file or even better, ZK. > Re-factor UpdateChain and UpdateProcessor interfaces > ---------------------------------------------------- > > Key: SOLR-2842 > URL: https://issues.apache.org/jira/browse/SOLR-2842 > Project: Solr > Issue Type: Improvement > Components: update > Reporter: Jan Høydahl > > The UpdateChain's main task is to send SolrInputDocuments through a chain of > UpdateRequestProcessors in order to transform them in some way and then > (typically) indexing them. > This generic "pipeline" concept would also be useful on the client side > (SolrJ), so that we could choose to do parts or all of the processing on the > client. The most prominent use case is extracting text (Tika) from large > binary documents, residing on local storage on the client(s). Streaming > hundreds of Mb over to Solr for processing is not efficcient. See SOLR-1526. > We're already implementing Tika as an UpdateProcessor in SOLR-1763, and what > would be more natural than reusing this - and any other processor - on the > client side? > However, for this to be possible, some interfaces need to change slightly.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org