BewareMyPower commented on issue #10721:
URL: https://github.com/apache/pulsar/issues/10721#issuecomment-909838082


   I still cannot reproduce the bug. For the first log file, it looks like the 
reconnection logs only happened for `send`, instead of `create_producer`. I ran 
your code with a little change that adds the timeout for send
   
   ```python
           rv = cl.create_producer('foobar', send_timeout_millis=1000)
   ```
   
   Here's the output:
   
   ```
   Sent: 2000
   Sent: 3000
   2021-09-01 11:02:40.385 INFO  [0x7000048d7000] HandlerBase:142 | 
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 
0.1 s
   2021-09-01 11:02:40.394 ERROR [0x7000048d7000] ClientConnection:572 | 
[127.0.0.1:59743 -> 127.0.0.1:6650] Read failed: End of file
   2021-09-01 11:02:40.394 INFO  [0x7000048d7000] ClientConnection:1495 | 
[127.0.0.1:59743 -> 127.0.0.1:6650] Connection closed
   2021-09-01 11:02:40.487 INFO  [0x7000048d7000] HandlerBase:64 | 
[persistent://public/default/foobar, standalone-6-0] Getting connection from 
pool
   2021-09-01 11:02:40.487 INFO  [0x7000048d7000] ConnectionPool:86 | Deleting 
stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 
0x7fae5580c000
   2021-09-01 11:02:40.487 INFO  [0x7000048d7000] ClientConnection:181 | 
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:40.487 INFO  [0x7000048d7000] ConnectionPool:96 | Created 
connection for pulsar://localhost:6650
   2021-09-01 11:02:40.489 ERROR [0x7000048d7000] ClientConnection:448 | 
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection 
refused
   2021-09-01 11:02:40.489 INFO  [0x7000048d7000] ClientConnection:1495 | 
[<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:40.489 INFO  [0x7000048d7000] HandlerBase:142 | 
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 
0.182 s
   2021-09-01 11:02:40.676 INFO  [0x7000048d7000] HandlerBase:64 | 
[persistent://public/default/foobar, standalone-6-0] Getting connection from 
pool
   2021-09-01 11:02:40.676 INFO  [0x7000048d7000] ConnectionPool:86 | Deleting 
stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 
0x7fae54852800
   2021-09-01 11:02:40.676 INFO  [0x7000048d7000] ClientConnection:181 | 
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:40.676 INFO  [0x7000048d7000] ConnectionPool:96 | Created 
connection for pulsar://localhost:6650
   2021-09-01 11:02:40.678 ERROR [0x7000048d7000] ClientConnection:448 | 
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection 
refused
   2021-09-01 11:02:40.678 INFO  [0x7000048d7000] ClientConnection:1495 | 
[<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:40.678 INFO  [0x7000048d7000] HandlerBase:142 | 
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 
0.372 s
   2021-09-01 11:02:41.052 INFO  [0x7000048d7000] HandlerBase:64 | 
[persistent://public/default/foobar, standalone-6-0] Getting connection from 
pool
   2021-09-01 11:02:41.052 INFO  [0x7000048d7000] ConnectionPool:86 | Deleting 
stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 
0x7fae5384e800
   2021-09-01 11:02:41.052 INFO  [0x7000048d7000] ClientConnection:181 | 
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:41.052 INFO  [0x7000048d7000] ConnectionPool:96 | Created 
connection for pulsar://localhost:6650
   2021-09-01 11:02:41.054 ERROR [0x7000048d7000] ClientConnection:448 | 
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection 
refused
   2021-09-01 11:02:41.054 INFO  [0x7000048d7000] ClientConnection:1495 | 
[<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:41.054 INFO  [0x7000048d7000] HandlerBase:142 | 
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 
0.224 s
   2021-09-01 11:02:41.281 INFO  [0x7000048d7000] HandlerBase:64 | 
[persistent://public/default/foobar, standalone-6-0] Getting connection from 
pool
   2021-09-01 11:02:41.281 INFO  [0x7000048d7000] ConnectionPool:86 | Deleting 
stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 
0x7fae5305ce00
   2021-09-01 11:02:41.281 INFO  [0x7000048d7000] ClientConnection:181 | 
[<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:41.281 INFO  [0x7000048d7000] ConnectionPool:96 | Created 
connection for pulsar://localhost:6650
   2021-09-01 11:02:41.283 ERROR [0x7000048d7000] ClientConnection:448 | 
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection 
refused
   2021-09-01 11:02:41.283 INFO  [0x7000048d7000] ClientConnection:1495 | 
[<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:41.283 INFO  [0x7000048d7000] HandlerBase:142 | 
[persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 
1.568 s
   Got exception: Pulsar error: TimeOut
   2021-09-01 11:02:41.390 INFO  [0x118051e00] ClientConnection:181 | [<none> 
-> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:41.390 INFO  [0x118051e00] ConnectionPool:96 | Created 
connection for pulsar://localhost:6650
   2021-09-01 11:02:41.392 ERROR [0x7000049dd000] ClientConnection:448 | 
[<none> -> pulsar://localhost:6650] Failed to establish connection: Connection 
refused
   2021-09-01 11:02:41.392 INFO  [0x7000049dd000] ClientConnection:1495 | 
[<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:41.392 ERROR [0x7000049dd000] ClientImpl:188 | Error 
Checking/Getting Partition Metadata while creating producer on 
persistent://public/default/foobar -- ConnectError
   error creating producer: Pulsar error: ConnectError
   ```
   
   The reconnection related logs should appear before the `Got exception: 
Pulsar error: TimeOut` line, but it appeared after this line in your logs. It 
might be caused by the stdout not synchronized. During `create_producer`, there 
should be no logs printed by `HandlerBase`.
   
   I suspected there's something wrong with the phase to establish connection 
because your logs don't contain `Error Checking/Getting Partition Metadata`. 
`create_producer` performs topic lookup at first, so if the connection cannot 
be established, the topic lookup will fail.
   
   Since you built source from master, you can add connection timeout and try 
again.
   
   ```python
           cl = pulsar.Client(
               'pulsar://localhost:6650',
               operation_timeout_seconds=1,
               connection_timeout_ms=1000,
           )
   ```
   
   > once against an already SIGSTOPped broker.
   
   Regarding to this case, could you enable debug logs and try again? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to