: I was only able to confirm that the replica was out of sync by checking 
: the timesFailed field in the core’s replication.properties file.

...

:   1)Would it make sense to add some logs/telemetry to allow SolrCloud 
: users to track PULL node replication better? Or would that type of 

You can somewhat monitor for this type of situation by tracking the 

"SEARCHER.searcher.indexVersion" and/or 
"REPLICATION./replication.generation" of each core in your cluster, and 
then creating a synthetic "per-shard" metric that reports how many unique 
values there are across all the replicas of that shard.

The tricky part is that you need to *expect* that these values will differ 
between replicas anytime index changes are made on the leader, until every 
PULL replica get around to doing a fetch -- so when/if to alert on your 
synthetic metric is hard: A single unique value per shard is great, two 
distinct values is normal, three of more distinct values should be 
suspect.

(either some replicas are failing to replicate, or your index is so 
big, and your auto commit setting is so low, that some replicas are still 
trying to catch up on gen=5, while other replicas have already replicated 
gen=7)

We should probably expose a metric that tracks the same "timesFailed" 
metadata you found in replication.properties -- but that will still only 
tell you if a replication was attempted and failed. (what if the 
replication thread had a fatal exception and stopped?)


-Hoss
http://www.lucidworks.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to