Erick Erickson updated SOLR-13913:
    Component/s: CDCR

> CDCR should limit TLOG growth
> -----------------------------
>                 Key: SOLR-13913
>                 URL: https://issues.apache.org/jira/browse/SOLR-13913
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>            Reporter: Erick Erickson
>            Priority: Major
> CDCR uses TLOGs for a queueing mechanism. If the connection between DCs goes 
> down for any reason and is not caught, the tlogs will grow forever, which can 
> lead to disk full situations and all that entails.
> Aside from that problem, it's not clear that reprocessing a zillion updates 
> is faster than a full replication anyway.
> Since the full-index replication was added, we can avoid runaway tlogs by 
> somehow noticing we haven't been connected to the remote DC for a long time, 
> purge the tlogs (keeping just enough for peer sync of course) and do a full 
> index replication next time we do connect.
> This is pretty vague, I don't have a good idea of whether tlog size is the 
> right metric, or some sort of time since last successful transmission, or the 
> queue size or some combination of these and others. The point is simply that 
> after some threshold was crossed, reset to a zero state and avoid the 
> pitfalls of continuing to accumulate updates.
> I'd suggest these be tunable parameters defined in solrconfig.xml since I can 
> imagine that  terabyte-scale indexes should fall back to full-index 
> replication more rarely than megabyte-scale indexes.
> This idea came up in discussions and I wanted to preserve the it in case 
> someone wants to pursue it.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to