bschofield commented on issue #697:
URL: 
https://github.com/apache/pulsar-client-go/issues/697#issuecomment-1005484225


   > Hello @bschofield Can you provide more log info for this, thanks.
   
   It is hard for me to know exactly which log lines are relevant, but here is 
an example that looks interesting, particularly the "use of closed network 
connection":
   
   ```
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.682869,"caller":"messaging/logging_zap.go:67","msg":"[Detected
 stale connection to 
broker]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:59382"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.6829515,"caller":"messaging/logging_zap.go:67","msg":"[Failed
 to write on 
connection]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:59382","error":"write
 tcp 10.244.1.173:59382->10.244.0.2:6650: use of closed network connection"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.6829886,"caller":"messaging/logging_zap.go:67","msg":"[Connection
 was 
closed]","topic":"persistent://REDACTED-P2","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":1,"cnx":"10.244.1.173:59382
 -> 10.244.0.2:6650"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.6830094,"caller":"messaging/logging_zap.go:67","msg":"[Connection
 was 
closed]","topic":"persistent://REDACTED-W0","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":8,"cnx":"10.244.1.173:59382
 -> 10.244.0.2:6650"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.6830208,"caller":"messaging/logging_zap.go:67","msg":"[Connection
 was 
closed]","topic":"persistent://REDACTED-J1","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":6,"cnx":"10.244.1.173:59382
 -> 10.244.0.2:6650"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.6830392,"caller":"messaging/logging_zap.go:67","msg":"[Failed
 to write on 
connection]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:59382","error":"write
 tcp 10.244.1.173:59382->10.244.0.2:6650: use of closed network connection"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.6830513,"caller":"messaging/logging_zap.go:67","msg":"[Failed
 to write on 
connection]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:59382","error":"write
 tcp 10.244.1.173:59382->10.244.0.2:6650: use of closed network connection"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.6830642,"caller":"messaging/logging_zap.go:67","msg":"[Failed
 to write on 
connection]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:59382","error":"write
 tcp 10.244.1.173:59382->10.244.0.2:6650: use of closed network connection"}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.6830602,"caller":"messaging/logging_zap.go:62","msg":"[runEventsLoop
 will reconnect in 
producer]","topic":"persistent://REDACTED-P2","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":1}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.6830988,"caller":"messaging/logging_zap.go:62","msg":"[Reconnecting
 to broker in  
108.668329ms]","topic":"persistent://REDACTED-P2","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":1}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.6830504,"caller":"messaging/logging_zap.go:62","msg":"[runEventsLoop
 will reconnect in 
producer]","topic":"persistent://REDACTED-W0","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":8}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.6831107,"caller":"messaging/logging_zap.go:62","msg":"[Reconnecting
 to broker in  
107.590908ms]","topic":"persistent://REDACTED-W0","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":8}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.68313,"caller":"messaging/logging_zap.go:62","msg":"[runEventsLoop
 will reconnect in 
producer]","topic":"persistent://REDACTED-J1","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":6}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.6831586,"caller":"messaging/logging_zap.go:62","msg":"[Reconnecting
 to broker in  
111.142612ms]","topic":"persistent://REDACTED-J1","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":6}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.7922926,"caller":"messaging/logging_zap.go:62","msg":"[Connecting
 to broker]","remote_addr":"pulsar://10.244.0.2:6650"}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.7961068,"caller":"messaging/logging_zap.go:62","msg":"[TCP
 connection 
established]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:39250"}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.79917,"caller":"messaging/logging_zap.go:62","msg":"[Connection
 is 
ready]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:39250"}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.802493,"caller":"messaging/logging_zap.go:82","msg":"Resending
 [1780] pending 
batches","topic":"persistent://REDACTED-P2","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":1}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.8025355,"caller":"messaging/logging_zap.go:82","msg":"Resending
 [680] pending 
batches","topic":"persistent://REDACTED-J1","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":6}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.8025396,"caller":"messaging/logging_zap.go:82","msg":"Resending
 [1866] pending 
batches","topic":"persistent://REDACTED-W0","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":8}
   2021-12-29 16:12:31 
{"level":"info","ts":1640794351.8164892,"caller":"messaging/logging_zap.go:62","msg":"[Reconnected
 producer to 
broker]","topic":"persistent://REDACTED-J1","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":6,"cnx":"10.244.1.173:39250
 -> 10.244.0.2:6650"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.8165426,"caller":"messaging/logging_zap.go:87","msg":"Received
 ack for [ledgerId:1021695 entryId:47743  739 711] on sequenceId %!v(MISSING) - 
expected: %!v(MISSING), closing 
connection","topic":"persistent://REDACTED-J1","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":6}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.8166032,"caller":"messaging/logging_zap.go:67","msg":"[Connection
 was 
closed]","topic":"persistent://REDACTED-P2","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":1,"cnx":"10.244.1.173:39250
 -> 10.244.0.2:6650"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.8166149,"caller":"messaging/logging_zap.go:67","msg":"[Connection
 was 
closed]","topic":"persistent://REDACTED-W0","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":8,"cnx":"10.244.1.173:39250
 -> 10.244.0.2:6650"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.8166232,"caller":"messaging/logging_zap.go:67","msg":"[Connection
 was 
closed]","topic":"persistent://REDACTED-J1","producer_name":"REDACTED-76b5b9cbf9-6rs76","producerID":6,"cnx":"10.244.1.173:39250
 -> 10.244.0.2:6650"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.8166335,"caller":"messaging/logging_zap.go:67","msg":"[Failed
 to write on 
connection]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:39250","error":"write
 tcp 10.244.1.173:39250->10.244.0.2:6650: use of closed network connection"}
   2021-12-29 16:12:31 
{"level":"warn","ts":1640794351.816643,"caller":"messaging/logging_zap.go:67","msg":"[Failed
 to write on 
connection]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.1.173:39250","error":"write
 tcp 10.244.1.173:39250->10.244.0.2:6650: use of closed network connection"}
   ```
   
   I also see some other errors which might be unrelated problems, e.g. 
   
   ```
   2021-12-29 16:13:02 | 
{"level":"warn","ts":1640794382.4651952,"caller":"messaging/logging_zap.go:87","msg":"Received
 send error from server: [[PersistenceError 
org.apache.bookkeeper.mledger.ManagedLedgerException: Bookie operation 
timeout]] : 
[%!s(MISSING)]","remote_addr":"pulsar://10.244.0.2:6650","local_addr":"10.244.0.113:54102"}
   ```
   
   >The #689 does not seem to cause the problem described in this issue, 
because the reconnect stuck occurs on the prouder side
   
   I did not suggest that you looked at #689. I suggested that you look at #691.
   
   In PR #691 (https://github.com/apache/pulsar-client-go/pull/691/files), you 
changed `pulsar/producer_partition.go` so that `p.reconnectToBroker()` is 
called in a separate goroutine.
   
   My (untested) hypothesis is that one of the other `p.*()` calls from the 
original goroutine could perhaps now be happening at the same time as 
`p.reconnectToBroker()`:
   
        for {
                select {
                case i := <-p.eventsChan:
                        switch v := i.(type) {
                        case *sendRequest:
                                p.internalSend(v)
                        case *flushRequest:
                                p.internalFlush(v)
                        case *closeProducer:
                                p.internalClose(v)
                                return
                        }
                case <-p.batchFlushTicker.C:
                        if p.batchBuilder.IsMultiBatches() {
                                p.internalFlushCurrentBatches()
                        } else {
                                p.internalFlushCurrentBatch()
                        }
                }
        }
   
   I was wondering if this could cause the freeze in `p.grabCnx()` that you are 
seeing.
   
   Again, I haven't tested this idea, but thought it could be worth mentioning 
since I experienced a similar problem but only with PR #691 applied.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to