[ 
https://issues.apache.org/jira/browse/SOLR-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664330#comment-16664330
 ] 

Varun Thacker commented on SOLR-12057:
--------------------------------------

Hi Amrit,

 

Some feedback on 

CdcrUpdateProcessor 
 * Can we add some javadocs as to what this update processor wants to achieve?
 * Do we still need to override versionAdd / versionDelete versionDeleteByQuery 
 ?
 * It would be nice to add some basic docs to the {{filterParams}} method to 
indicate what it's trying to filter etc.

On CdcrReplicaTypesTest
 * {{//.withProperty("solr.directoryFactory", 
"solr.StandardDirectoryFactory")}} - Can we remove this comment?
 * Is {{testTlogReplica}} meant to only have tlog replicas? The create 
collection uses a combination of nrtReplicas and tlogReplicas so I'm trying to 
understand the motivation here
 * "Not really, we can remove this safely, from, all tests; 2 sec sleep is for 
loading the Cdcr components and avoiding potentially few retries."  - You 
mentioned this but the patch still has a 2s delay
 * {{int batchSize = (TEST_NIGHTLY ? 100 : 10);}} - does batchSize represent 
numBatches? 100 seems to be the batch size in the inner loop

>From a design perspective :

Given the improvements you've made with the patch , are we in a position to 
roll up this block from CdcrUpdateProcessor into DistributedUpdateProcessor ? 
If yes then we would get CDCR to work even without them having to add an 
UpdateProcessor ? We coiuld keep CdcrUpdateProcessor as is for backward compat 
but remove references of it from the docs
{code:java}
if (params.get(CDCR_UPDATE) != null) {
  result.set(CDCR_UPDATE, "");
  result.set(CommonParams.VERSION_FIELD, 
params.get(CommonParams.VERSION_FIELD));
}{code}

> CDCR does not replicate to Collections with TLOG Replicas
> ---------------------------------------------------------
>
>                 Key: SOLR-12057
>                 URL: https://issues.apache.org/jira/browse/SOLR-12057
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>    Affects Versions: 7.2
>            Reporter: Webster Homer
>            Assignee: Varun Thacker
>            Priority: Major
>         Attachments: SOLR-12057.patch, SOLR-12057.patch, SOLR-12057.patch, 
> SOLR-12057.patch, SOLR-12057.patch, cdcr-fail-with-tlog-pull.patch, 
> cdcr-fail-with-tlog-pull.patch
>
>
> We created a collection using TLOG replicas in our QA clouds.
> We have a locally hosted solrcloud with 2 nodes, all our collections have 2 
> shards. We use CDCR to replicate the collections from this environment to 2 
> data centers hosted in Google cloud. This seems to work fairly well for our 
> collections with NRT replicas. However the new TLOG collection has problems.
>  
> The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2 
> shards per collection with 2 replicas per shard.
>  
> We never see data show up in the cloud collections, but we do see tlog files 
> show up on the cloud servers. I can see that all of the servers have cdcr 
> started, buffers are disabled.
> The cdcr source configuration is:
>  
> "requestHandler":{"/cdcr":{
>       "name":"/cdcr",
>       "class":"solr.CdcrRequestHandler",
>       "replica":[
>         {
>           
> "zkHost":"[xxx-mzk01.sial.com:2181|http://xxx-mzk01.sial.com:2181/],[xxx-mzk02.sial.com:2181|http://xxx-mzk02.sial.com:2181/],[xxx-mzk03.sial.com:2181/solr|http://xxx-mzk03.sial.com:2181/solr]";,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"},
>         {
>           
> "zkHost":"[yyyy-mzk01.sial.com:2181|http://yyyy-mzk01.sial.com:2181/],[yyyy-mzk02.sial.com:2181|http://yyyy-mzk02.sial.com:2181/],[yyyy-mzk03.sial.com:2181/solr|http://yyyy-mzk03.sial.com:2181/solr]";,
>           "source":"b2b-catalog-material-180124T",
>           "target":"b2b-catalog-material-180124T"}],
>       "replicator":{
>         "threadPoolSize":4,
>         "schedule":500,
>         "batchSize":250},
>       "updateLogSynchronizer":\{"schedule":60000}}}}
>  
> The target configurations in the 2 clouds are the same:
> "requestHandler":{"/cdcr":{ "name":"/cdcr", 
> "class":"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}} 
>  
> All of our collections have a timestamp field, index_date. In the source 
> collection all the records have a date of 2/28/2018 but the target 
> collections have a latest date of 1/26/2018
>  
> I don't see cdcr errors in the logs, but we use logstash to search them, and 
> we're still perfecting that. 
>  
> We have a number of similar collections that behave correctly. This is the 
> only collection that is a TLOG collection. It appears that CDCR doesn't 
> support TLOG collections.
>  
> It looks like the data is getting to the target servers. I see tlog files 
> with the right timestamps. Looking at the timestamps on the documents in the 
> collection none of the data appears to have been loaded.In the solr.log I see 
> lots of /cdcr messages  action=LASTPROCESSEDVERSION,  
> action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT 
>  
> no errors
>  
> Target collections autoCommit is set to  60000 I tried sending a commit 
> explicitly no difference. cdcr is uploading data, but no new data appears in 
> the collection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to