massakam opened a new issue #6438: Size of replication backlog becomes very 
large
URL: https://github.com/apache/pulsar/issues/6438
 
 
   Recently, the number of messages in the replication backlog for a particular 
topic has become very large.
   
   This topic is replicated on two clusters, and all producers and consumers 
are connected to only one cluster. The strange thing is that the replication 
backlog is larger in the cluster where no producer and consumer are connected. 
The following is the stats of the topic in that cluster.
   ```json
   {
     "msgRateIn" : 1410.798423526815,
     "msgThroughputIn" : 556605.2280307647,
     "msgRateOut" : 0.0,
     "msgThroughputOut" : 0.0,
     "averageMsgSize" : 394.5320740005671,
     "storageSize" : 2455313235,
     "publishers" : [ ],
     "subscriptions" : { },
     "replication" : {
       "jp-west" : {
         "msgRateIn" : 1410.798423526815,
         "msgThroughputIn" : 556605.2280307647,
         "msgRateOut" : 0.0,
         "msgThroughputOut" : 0.0,
         "msgRateExpired" : 0.0,
         "replicationBacklog" : 6258001,
         "connected" : false,
         "replicationDelayInSeconds" : 0,
         "inboundConnection" : "/xxx.xxx.xxx.xxx:40710",
         "inboundConnectedSince" : "2020-01-08T01:38:23.565+09:00"
       }
     },
     "deduplicationStatus" : "Disabled"
   }
   ```
   
   Notable is the `"connected": false` part. Since this topic is not active (no 
producer or consumer) in this cluster, it is seems that the replicator has been 
closed by topic GC.
   
   I think the cause of this issue is that the replicator throttles reading 
entries while the producer for geo-replication is closed. If the publish rate 
of messages is high, reading entries by the replicator will not keep up with 
message publishing and the replication backlog will increase.
   
https://github.com/apache/pulsar/blob/v2.3.2/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentReplicator.java#L155-L162
   
   It is reasonable to throttle reading of messages published to the local 
cluster while the producer for geo-replication is closed. However, there is no 
need to throttle reading messages replicated from other clusters. The 
replicator discards these messages and does not send them using the producer.
   
https://github.com/apache/pulsar/blob/v2.3.2/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentReplicator.java#L226-L232
   
   - OS: CentOS 7.7
   - Pulsar: 2.4.2

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to