BewareMyPower commented on issue #10721:
URL: https://github.com/apache/pulsar/issues/10721#issuecomment-909838082
I still cannot reproduce the bug. For the first log file, it looks like the
reconnection logs only happened for `send`, instead of `create_producer`. I ran
your code with a little change that adds the timeout for send
```python
rv = cl.create_producer('foobar', send_timeout_millis=1000)
```
Here's the output:
```
Sent: 2000
Sent: 3000
2021-09-01 11:02:40.385 INFO [0x7000048d7000] HandlerBase:142 |
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in
0.1 s
2021-09-01 11:02:40.394 ERROR [0x7000048d7000] ClientConnection:572 |
[127.0.0.1:59743 -> 127.0.0.1:6650] Read failed: End of file
2021-09-01 11:02:40.394 INFO [0x7000048d7000] ClientConnection:1495 |
[127.0.0.1:59743 -> 127.0.0.1:6650] Connection closed
2021-09-01 11:02:40.487 INFO [0x7000048d7000] HandlerBase:64 |
[persistent://public/default/foobar, standalone-6-0] Getting connection from
pool
2021-09-01 11:02:40.487 INFO [0x7000048d7000] ConnectionPool:86 | Deleting
stale connection from pool for pulsar://localhost:6650 use_count: 1 @
0x7fae5580c000
2021-09-01 11:02:40.487 INFO [0x7000048d7000] ClientConnection:181 |
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:40.487 INFO [0x7000048d7000] ConnectionPool:96 | Created
connection for pulsar://localhost:6650
2021-09-01 11:02:40.489 ERROR [0x7000048d7000] ClientConnection:448 |
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection
refused
2021-09-01 11:02:40.489 INFO [0x7000048d7000] ClientConnection:1495 |
[<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:40.489 INFO [0x7000048d7000] HandlerBase:142 |
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in
0.182 s
2021-09-01 11:02:40.676 INFO [0x7000048d7000] HandlerBase:64 |
[persistent://public/default/foobar, standalone-6-0] Getting connection from
pool
2021-09-01 11:02:40.676 INFO [0x7000048d7000] ConnectionPool:86 | Deleting
stale connection from pool for pulsar://localhost:6650 use_count: 1 @
0x7fae54852800
2021-09-01 11:02:40.676 INFO [0x7000048d7000] ClientConnection:181 |
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:40.676 INFO [0x7000048d7000] ConnectionPool:96 | Created
connection for pulsar://localhost:6650
2021-09-01 11:02:40.678 ERROR [0x7000048d7000] ClientConnection:448 |
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection
refused
2021-09-01 11:02:40.678 INFO [0x7000048d7000] ClientConnection:1495 |
[<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:40.678 INFO [0x7000048d7000] HandlerBase:142 |
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in
0.372 s
2021-09-01 11:02:41.052 INFO [0x7000048d7000] HandlerBase:64 |
[persistent://public/default/foobar, standalone-6-0] Getting connection from
pool
2021-09-01 11:02:41.052 INFO [0x7000048d7000] ConnectionPool:86 | Deleting
stale connection from pool for pulsar://localhost:6650 use_count: 1 @
0x7fae5384e800
2021-09-01 11:02:41.052 INFO [0x7000048d7000] ClientConnection:181 |
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:41.052 INFO [0x7000048d7000] ConnectionPool:96 | Created
connection for pulsar://localhost:6650
2021-09-01 11:02:41.054 ERROR [0x7000048d7000] ClientConnection:448 |
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection
refused
2021-09-01 11:02:41.054 INFO [0x7000048d7000] ClientConnection:1495 |
[<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:41.054 INFO [0x7000048d7000] HandlerBase:142 |
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in
0.224 s
2021-09-01 11:02:41.281 INFO [0x7000048d7000] HandlerBase:64 |
[persistent://public/default/foobar, standalone-6-0] Getting connection from
pool
2021-09-01 11:02:41.281 INFO [0x7000048d7000] ConnectionPool:86 | Deleting
stale connection from pool for pulsar://localhost:6650 use_count: 1 @
0x7fae5305ce00
2021-09-01 11:02:41.281 INFO [0x7000048d7000] ClientConnection:181 |
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:41.281 INFO [0x7000048d7000] ConnectionPool:96 | Created
connection for pulsar://localhost:6650
2021-09-01 11:02:41.283 ERROR [0x7000048d7000] ClientConnection:448 |
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection
refused
2021-09-01 11:02:41.283 INFO [0x7000048d7000] ClientConnection:1495 |
[<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:41.283 INFO [0x7000048d7000] HandlerBase:142 |
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in
1.568 s
Got exception: Pulsar error: TimeOut
2021-09-01 11:02:41.390 INFO [0x118051e00] ClientConnection:181 | [<none>
-> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:41.390 INFO [0x118051e00] ConnectionPool:96 | Created
connection for pulsar://localhost:6650
2021-09-01 11:02:41.392 ERROR [0x7000049dd000] ClientConnection:448 |
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection
refused
2021-09-01 11:02:41.392 INFO [0x7000049dd000] ClientConnection:1495 |
[<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:41.392 ERROR [0x7000049dd000] ClientImpl:188 | Error
Checking/Getting Partition Metadata while creating producer on
persistent://public/default/foobar -- ConnectError
error creating producer: Pulsar error: ConnectError
```
The reconnection related logs should appear before the `Got exception:
Pulsar error: TimeOut` line, but it appeared after this line in your logs. It
might be caused by the stdout not synchronized. During `create_producer`, there
should be no logs printed by `HandlerBase`.
I suspected there's something wrong with the phase to establish connection
because your logs don't contain `Error Checking/Getting Partition Metadata`.
`create_producer` performs topic lookup at first, so if the connection cannot
be established, the topic lookup will fail.
Since you built source from master, you can add connection timeout and try
again.
```python
cl = pulsar.Client(
'pulsar://localhost:6650',
operation_timeout_seconds=1,
connection_timeout_ms=1000,
)
```
> once against an already SIGSTOPped broker.
Regarding to this case, could you enable debug logs and try again?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]