[ 
https://issues.apache.org/jira/browse/SOLR-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143257#comment-14143257
 ] 

Joel Bernstein commented on SOLR-6266:
--------------------------------------

>From my understanding the CAPIServer is listening on a port. Couchbase can be 
>configured to replicate a bucket to a specific host and post.  So running the 
>CAPIServer just means that there will be many CAPIServers running. The actual 
>replication session will be between Couchbase and a single CAPIServer. So in a 
>single repication session documents will flow to one CAPIServer and that 
>CAPIServer and that Solr instance move the documents into the distributed 
>indexing flow.

>From this scenario running a CAPIServer on all replicas really has no 
>downside. 

But running the CAPIServer from just the leader has a couple of major downsides:

1) Leaders and replicas will change. Couchbase is pointing directly to an 
ip:port. If all of sudden that node is no longer the leader then replication 
has stopped. If the CAPIServer is running on all replicas then this is not an 
issue. 

2) If we run the CAPIServer everywhere we don't have to manage bringing 
CAPIServers up and down as the leader changes. So this removes quite a bit of 
complexity from the design.

We don't have to worry about duplicate indexing on shards by running 
CAPIServers on the replicas. If we inject the documents properly into the 
SolrCloud indexing flow, then SolrCloud with ensure that documents get to the 
right place.

What we do have to consider very carefully though is whether we need a 
CAPIServer running per Collection or per Solr node, because this effect the 
entire design.

My thinking is that we should have a single CAPIServer per Solr node to 
services all collections. I'm assuming that the CAPIServer has thread overhead 
that we don't want for each collection. 

But if we decide to go this route then we will need to route documents to 
correct collection based on the bucket name. We'll need to also figure out how 
to place the CAPIServer so there is only one per node. 






> Couchbase plug-in for Solr
> --------------------------
>
>                 Key: SOLR-6266
>                 URL: https://issues.apache.org/jira/browse/SOLR-6266
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Varun
>            Assignee: Joel Bernstein
>         Attachments: solr-couchbase-plugin.tar.gz, 
> solr-couchbase-plugin.tar.gz
>
>
> It would be great if users could connect Couchbase and Solr so that updates 
> to Couchbase can automatically flow to Solr. Couchbase provides some very 
> nice API's which allow applications to mimic the behavior of a Couchbase 
> server so that it can receive updates via Couchbase's normal cross data 
> center replication (XDCR).
> One possible design for this is to create a CouchbaseLoader that extends 
> ContentStreamLoader. This new loader would embed the couchbase api's that 
> listen for incoming updates from couchbase, then marshal the couchbase 
> updates into the normal Solr update process. 
> Instead of marshaling couchbase updates into the normal Solr update process, 
> we could also embed a SolrJ client to relay the request through the http 
> interfaces. This may be necessary if we have to handle mapping couchbase 
> "buckets" to Solr collections on the Solr side. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to