f-ld commented on issue #5104: Handle already have connected replicate produce 
with same name
URL: https://github.com/apache/pulsar/issues/5104#issuecomment-542640572
 
 
   Additional information to understand the above logs and explanations.
   
   IPs per region:
   - region1 : 10.10.0.0/16
   - region2 : 10.11.0.0/16
   - region3 : 10.12.0.0/16
   - region4 : 10.13.0.0/16
   - region5 : 10.14.0.0/16
   
   We have 5 brokers per region. And for that topic, 12 partitions.
   
   Regarding the partitioned-stats from previous message and the inbound 
connection from region3 to region1, I can see it on the broker in region1:
   ```
   tcp        0      0 10.10.3.46:6650         10.12.3.213:43300       
ESTABLISHED 11/java             
   ```
   and on the broker in region3:
   ```
   tcp        0      0 10.12.3.213:43300       10.10.3.46:6650         
ESTABLISHED 11/java             
   ```
   So that inbound connection is actually real.
   
   And for that specific partition, on that specific broker from region3 I have 
those logs:
   ```
   10:21:58.694 [pulsar-io-22-13] INFO  
org.apache.pulsar.client.impl.ProducerImpl - 
[persistent://tenant/namespace/topic-partition-1] [pulsar.repl.region3] 
Creating producer on cnx [id: 0x581a0460, L:/10.12.3.213:43308 - 
R:10.10.3.46/10.10.3.46:6650]
   10:21:58.893 [pulsar-io-22-2] WARN  org.apache.pulsar.client.impl.ClientCnx 
- [id: 0x581a0460, L:/10.12.3.213:43308 - R:10.10.3.46/10.10.3.46:6650] 
Received error from server: Producer with name 'pulsar.repl.region3' is already 
connected to topic
   10:21:58.893 [pulsar-io-22-2] ERROR 
org.apache.pulsar.client.impl.ProducerImpl - 
[persistent://tenant/namespace/topic-partition-1] [pulsar.repl.region3] Failed 
to create producer: Producer with name 'pulsar.repl.region3' is already 
connected to topic
   ```
   
   And in region1 I have those logs:
   ```
   10:21:58.785 [pulsar-io-22-14] INFO  
org.apache.pulsar.broker.service.ServerCnx - 
[/10.12.3.213:43308][persistent://tenant/namespace/topic-partition-1] Creating 
producer. producerId=22291
   10:21:58.785 [ForkJoinPool.commonPool-worker-4] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/10.12.3.213:43308]-22291 
persistent://tenant/namespace/topic-partition-1 configured with schema false
   10:21:58.785 [ForkJoinPool.commonPool-worker-4] ERROR 
org.apache.pulsar.broker.service.ServerCnx - [/10.12.3.213:43308] Failed to add 
producer to topic persistent://tenant/namespace/topic-partition-1: Producer 
with name 'pulsar.repl.region3' is already connected to topic
   ```
   
   _(I picked that example of logs because they involve the same brokers as the 
existing replication connection reported by partitioned-stats. But similar logs 
are available for all 12 partitions in all brokers of both regions)_
   
   Checking partitioned stats on region3 for that partition:
   ```
         "replication" : {
           "region1" : {
             "msgRateIn" : 0.0,
             "msgThroughputIn" : 0.0,
             "msgRateOut" : 0.0,
             "msgThroughputOut" : 0.0,
             "msgRateExpired" : 4.553733596940979E-5,
             "replicationBacklog" : 0,
             "connected" : false,
             "replicationDelayInSeconds" : 0,
             "inboundConnection" : "/10.10.0.23:57932",
             "inboundConnectedSince" : "2019-10-15T08:48:49.659Z"
           },
           "region2" : {
             "msgRateIn" : 0.0,
             "msgThroughputIn" : 0.0,
             "msgRateOut" : 0.0,
             "msgThroughputOut" : 0.0,
             "msgRateExpired" : 0.0,
             "replicationBacklog" : 0,
             "connected" : true,
             "replicationDelayInSeconds" : 0,
             "inboundConnection" : "/10.11.0.182:45338",
             "inboundConnectedSince" : "2019-10-15T08:28:14.815Z",
             "outboundConnection" : "[id: 0xba518cf3, L:/10.12.1.15:37322 - 
R:10.11.1.120/10.11.1.120:6650]",
             "outboundConnectedSince" : "2019-10-15T09:19:45.524Z"
           },
           "region4" : {
             "msgRateIn" : 0.0,
             "msgThroughputIn" : 0.0,
             "msgRateOut" : 0.0,
             "msgThroughputOut" : 0.0,
             "msgRateExpired" : 0.0,
             "replicationBacklog" : 0,
             "connected" : true,
             "replicationDelayInSeconds" : 0,
             "inboundConnection" : "/10.13.3.30:33686",
             "inboundConnectedSince" : "2019-10-15T08:56:40.66Z",
             "outboundConnection" : "[id: 0x0e8f64e2, L:/10.12.1.15:41864 - 
R:10.13.3.30/10.13.3.30:6650]",
             "outboundConnectedSince" : "2019-10-15T09:19:45.355Z"
           },
           "region5" : {
             "msgRateIn" : 0.0,
             "msgThroughputIn" : 0.0,
             "msgRateOut" : 0.0,
             "msgThroughputOut" : 0.0,
             "msgRateExpired" : 0.0,
             "replicationBacklog" : 0,
             "connected" : true,
             "replicationDelayInSeconds" : 0,
             "inboundConnection" : "/10.14.1.64:56096",
             "inboundConnectedSince" : "2019-10-15T08:18:27.448Z",
             "outboundConnection" : "[id: 0x885ed221, L:/10.12.1.15:53210 - 
R:10.14.0.208/10.14.0.208:6650]",
             "outboundConnectedSince" : "2019-10-15T09:19:45.754Z"
           },
         },
         "deduplicationStatus" : "Disabled"
       }
   ```
   We have indeed no outbound connection from region3 to region1.
   
   So it would be like that broker in region3 has lost track of the existing 
connection to the other broker in region1 (indeed, it does not appear in 
partitioned stats of region3), tries to open it again but fails because brokers 
in region1 still have it.
   
   Unfortunately, I do not have historical logs to check if at some point 
broker in region 3 tried to drop the connection to region1 but failed (keeping 
the tcp connection but not the information of that outbound connection).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to