[ 
https://issues.apache.org/jira/browse/SOLR-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128413#comment-13128413
 ] 

Jan Høydahl commented on SOLR-2842:
-----------------------------------

Yep, for distrib cloud stuff it would be cool to be able to have dedicated doc 
processor nodes.

I don't think the client necessarily needs to be THAT fat or complex if this is 
done right. If we make the UpdateChain and the Processor itself more 
stand-alone, not depending on SolrCore, and make updateChains easily 
configurable outside of solrconfig.xml (see SOLR-2841), then it would be 
straight-forward to instansiate a chain on the client side, without the 
RunUpdateProcessor of course. Some processors use Schema, so we'd perhaps need 
a way to fetch the correct schema from the server, using admin/file or even 
better, ZK.
                
> Re-factor UpdateChain and UpdateProcessor interfaces
> ----------------------------------------------------
>
>                 Key: SOLR-2842
>                 URL: https://issues.apache.org/jira/browse/SOLR-2842
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Jan Høydahl
>
> The UpdateChain's main task is to send SolrInputDocuments through a chain of 
> UpdateRequestProcessors in order to transform them in some way and then 
> (typically) indexing them.
> This generic "pipeline" concept would also be useful on the client side 
> (SolrJ), so that we could choose to do parts or all of the processing on the 
> client. The most prominent use case is extracting text (Tika) from large 
> binary documents, residing on local storage on the client(s). Streaming 
> hundreds of Mb over to Solr for processing is not efficcient. See SOLR-1526.
> We're already implementing Tika as an UpdateProcessor in SOLR-1763, and what 
> would be more natural than reusing this - and any other processor - on the 
> client side?
> However, for this to be possible, some interfaces need to change slightly..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to