fracasula commented on issue #7682:
URL: https://github.com/apache/pulsar/issues/7682#issuecomment-694085248
Hi guys, about the Golang deadlock, I'm able to replicate it with the latest
image too (i.e. `streamnative/pulsar-all:2.6.1-sn-3`). It still shows that the
client is stucks acking but it may not just be a client issue like suggested by
@alexku7. By further developing my test I was able to overlap the logs of
client and server and I can say that I consistenly get this on the server side
each time the client gets stuck:
```
time="2020-09-17T09:15:46+02:00" level=info msg="Error reading from
connection" error="Short read when reading frame size: read tcp
[::1]:48104->[::1]:32771: use of closed network connection"
local_addr="[::1]:48104" remote_addr="pulsar://localhost:32771"
07:15:46.482 [pulsar-io-51-16] INFO
org.apache.pulsar.broker.service.ServerCnx - Closed connection from
/172.17.0.1:38668
07:15:46.482 [pulsar-io-51-16] INFO
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
- Removed consumer
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/topic-name-61b9697b6f087aca,
name=subscription-name-756c1ab9019bbcdc}, consumerId=1,
consumerName=consumer-name-249de6d4895d0e15, address=/172.17.0.1:38668} with
pending 200 acks
07:15:46.486 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-756c1ab9019bbcdc]
Rewind from 13:700 to 13:400
consumer_partition_integration_test.go:112: [2020-09-17
09:15:46.62477658 +0200 CEST m=+181.898034767] Trying to ack 13:426:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446798277 +0200 CEST m=+171.004720824] Trying to ack 13:1318:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.44681232 +0200 CEST m=+171.004734868] Trying to ack 13:1332:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.44682636 +0200 CEST m=+171.004748901] Trying to ack 13:1334:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446840804 +0200 CEST m=+171.004763351] Trying to ack 13:1337:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446756504 +0200 CEST m=+171.004679062] Trying to ack 13:1324:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446845301 +0200 CEST m=+171.004767844] Trying to ack 13:1338:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446863899 +0200 CEST m=+171.004786443] Trying to ack 13:1340:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446876519 +0200 CEST m=+171.004799061] Trying to ack 13:1341:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446783149 +0200 CEST m=+171.004705702] Trying to ack 13:1328:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446772341 +0200 CEST m=+171.004694903] Trying to ack 13:1322:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446789742 +0200 CEST m=+171.004712290] Trying to ack 13:1321:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446902955 +0200 CEST m=+171.004825503] Trying to ack 13:1345:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.44692216 +0200 CEST m=+171.004844705] Trying to ack 13:1347:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446938844 +0200 CEST m=+171.004861396] Trying to ack 13:1348:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446957473 +0200 CEST m=+171.004880021] Trying to ack 13:1351:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446980957 +0200 CEST m=+171.004903502] Trying to ack 13:1349:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446985213 +0200 CEST m=+171.004907763] Trying to ack 13:1353:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446798504 +0200 CEST m=+171.004721046] Trying to ack 13:1323:0
consumer_partition_integration_test.go:112: [2020-09-16
18:16:12.446791901 +0200 CEST m=+171.004714458] Trying to ack 13:1326:0
...
```
Do you think that's something that could cause the consumer to hang forever
waiting for those pending acks to be processed?
I ran it multiple times, here's the output the second time:
```
08:33:20.866 [pulsar-io-51-7] INFO
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40022] Created
subscription on topic persistent://public/default/topic-name-61b9697b6f087aca /
subscription-name-2da7ed37505244dc
08:33:20.960 [pulsar-io-51-7] INFO
org.apache.pulsar.broker.service.ServerCnx - Closed connection from
/172.17.0.1:40022
08:33:20.960 [pulsar-io-51-7] INFO
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
- Removed consumer
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/topic-name-61b9697b6f087aca,
name=subscription-name-2da7ed37505244dc}, consumerId=1,
consumerName=consumer-name-748c24a16b99354a, address=/172.17.0.1:40022} with
pending 148 ack
```
Third:
```
08:34:45.073 [pulsar-io-51-8] INFO
org.apache.pulsar.broker.service.ServerCnx - New connection from
/172.17.0.1:40030
08:34:45.074 [pulsar-io-51-8] INFO
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40030] Subscribing on
topic persistent://public/default/topic-name-61b9697b6f087aca /
subscription-name-3afa9641ff625519
08:34:45.074 [pulsar-io-51-8] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[public/default/persistent/topic-name-61b9697b6f087aca] Cursor
subscription-name-3afa9641ff625519 recovered to position 41:99999
08:34:45.074 [pulsar-io-51-8] INFO
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl -
[public/default/persistent/topic-name-61b9697b6f087aca] Creating ledger,
metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103,
101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 100, 101,
102, 97, 117, 108, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116,
47, 116, 111, 112, 105, 99, 45, 110, 97, 109, 101, 45, 54, 49, 98, 57, 54, 57,
55, 98, 54, 102, 48, 56, 55, 97, 99, 97], pulsar/cursor=[115, 117, 98, 115, 99,
114, 105, 112, 116, 105, 111, 110, 45, 110, 97, 109, 101, 45, 51, 97, 102, 97,
57, 54, 52, 49, 102, 102, 54, 50, 53, 53, 49, 57], application=[112, 117, 108,
115, 97, 114]} - metadata ops timeout : 60 seconds
08:34:45.079 [main-EventThread] INFO
org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [127.0.0.1:3181] for
ledger: 45
08:34:45.081 [BookKeeperClientWorker-OrderedExecutor-5-0] INFO
org.apache.bookkeeper.mledger.impl.MetaStoreImpl -
[public/default/persistent/topic-name-61b9697b6f087aca]
[subscription-name-3afa9641ff625519] Updating cursor info ledgerId=45
mark-delete=41:99999
08:34:45.084 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[public/default/persistent/topic-name-61b9697b6f087aca] Updated cursor
subscription-name-3afa9641ff625519 with ledger id 45 md-position=41:99999
rd-position=41:100000
08:34:45.084 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl -
[public/default/persistent/topic-name-61b9697b6f087aca] Opened new cursor:
ManagedCursorImpl{ledger=public/default/persistent/topic-name-61b9697b6f087aca,
name=subscription-name-3afa9641ff625519, ackPos=41:99999, readPos=41:100000}
08:34:45.084 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-3afa9641ff625519]
Rewind from 41:0 to 41:0
08:34:45.085 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.pulsar.broker.service.persistent.PersistentTopic -
[persistent://public/default/topic-name-61b9697b6f087aca] There are no
replicated subscriptions on the topic
08:34:45.085 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.pulsar.broker.service.persistent.PersistentTopic -
[persistent://public/default/topic-name-61b9697b6f087aca][subscription-name-3afa9641ff625519]
Created new subscription for 1
08:34:45.085 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40030] Created
subscription on topic persistent://public/default/topic-name-61b9697b6f087aca /
subscription-name-3afa9641ff625519
08:34:45.086 [pulsar-io-51-8] INFO
org.apache.pulsar.broker.service.ServerCnx - Closed connection from
/172.17.0.1:40030
08:34:45.086 [pulsar-io-51-8] INFO
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
- Removed consumer
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/topic-name-61b9697b6f087aca,
name=subscription-name-3afa9641ff625519}, consumerId=1,
consumerName=consumer-name-3f0be6cd39d4812b, address=/172.17.0.1:40030} with
pending 0 acks
08:34:45.087 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-3afa9641ff625519]
Rewind from 41:1 to 41:0
08:34:45.187 [pulsar-io-51-9] INFO
org.apache.pulsar.broker.service.ServerCnx - New connection from
/172.17.0.1:40034
08:34:45.190 [pulsar-io-51-9] INFO
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40034] Subscribing on
topic persistent://public/default/topic-name-61b9697b6f087aca /
subscription-name-3afa9641ff625519
08:34:45.190 [pulsar-io-51-9] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-3afa9641ff625519]
Rewind from 41:0 to 41:0
08:34:45.190 [pulsar-io-51-9] INFO
org.apache.pulsar.broker.service.persistent.PersistentTopic -
[persistent://public/default/topic-name-61b9697b6f087aca] There are no
replicated subscriptions on the topic
08:34:45.191 [pulsar-io-51-9] INFO
org.apache.pulsar.broker.service.persistent.PersistentTopic -
[persistent://public/default/topic-name-61b9697b6f087aca][subscription-name-3afa9641ff625519]
Created new subscription for 1
08:34:45.191 [pulsar-io-51-9] INFO
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40034] Created
subscription on topic persistent://public/default/topic-name-61b9697b6f087aca /
subscription-name-3afa9641ff625519
08:34:45.286 [pulsar-io-51-9] INFO
org.apache.pulsar.broker.service.ServerCnx - Closed connection from
/172.17.0.1:40034
08:34:45.286 [pulsar-io-51-9] INFO
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
- Removed consumer
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/topic-name-61b9697b6f087aca,
name=subscription-name-3afa9641ff625519}, consumerId=1,
consumerName=consumer-name-3f0be6cd39d4812b, address=/172.17.0.1:40034} with
pending 64 acks
08:34:45.286 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl -
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-3afa9641ff625519]
Rewind from 41:3000 to 41:2836
08:34:50.409 [pulsar-web-69-13] INFO org.eclipse.jetty.server.RequestLog -
127.0.0.1 - - [17/Sep/2020:08:34:50 +0000] "GET
/admin/v2/persistent/public/functions/coordinate/stats?getPreciseBacklog=false
HTTP/1.1" 200 1283 "-" "Pulsar-Java-v2.6.1-sn-3" 7
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]