[jira] [Commented] (SOLR-2842) Re-factor UpdateChain and UpdateProcessor interfaces

Chris Male (Commented) (JIRA) Sun, 16 Oct 2011 22:44:37 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128682#comment-13128682
 ]


Chris Male commented on SOLR-2842:
----------------------------------

I really agree with both Yonik's here.  I'm very wary of making the client API 
any more complex.  There are many processing pipeline implementations out 
there, why should ours go client-side? It'd only benefit those using SolrJ and 
come at the cost of increased complexity.  Having to check that the 
UpdateProcessor is running on on a client or server and then throwing 
Exceptions in certain circumstances... it all just feels a little messy!

The Tika processing situation seems a different problem which Yonik's 
suggestion seems very reasonable - have a local Solr instance that replicates.

I also agree with Mark.  We shouldn't strip access to SolrCore, but I think we 
can reach a middle ground where the UpdateProcessor can define whether it wants 
a SolrCore reference? Bean setters anybody? The same goes for any Schemas / 
ResourceLoaders.  We should make them all optional but definitely accessible.

I don't want to seem like a downer, because I am fully for any refactorings and 
cleanups of these interfaces where possible.
                
> Re-factor UpdateChain and UpdateProcessor interfaces
> ----------------------------------------------------
>
>                 Key: SOLR-2842
>                 URL: https://issues.apache.org/jira/browse/SOLR-2842
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Jan Høydahl
>
> The UpdateChain's main task is to send SolrInputDocuments through a chain of 
> UpdateRequestProcessors in order to transform them in some way and then 
> (typically) indexing them.
> This generic "pipeline" concept would also be useful on the client side 
> (SolrJ), so that we could choose to do parts or all of the processing on the 
> client. The most prominent use case is extracting text (Tika) from large 
> binary documents, residing on local storage on the client(s). Streaming 
> hundreds of Mb over to Solr for processing is not efficcient. See SOLR-1526.
> We're already implementing Tika as an UpdateProcessor in SOLR-1763, and what 
> would be more natural than reusing this - and any other processor - on the 
> client side?
> However, for this to be possible, some interfaces need to change slightly..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2842) Re-factor UpdateChain and UpdateProcessor interfaces

Reply via email to