candlerb opened a new issue #5908: [doc] Document which exceptions can be returned by API calls URL: https://github.com/apache/pulsar/issues/5908 **Is your feature request related to a problem? Please describe.** Looking at the python API: currently it's undocumented which calls can return exceptions, and under what conditions. This means that user has to experiment to find out the behaviour. For example, take the following producer code: ``` import pulsar import time client = pulsar.Client('pulsar://localhost:6650') producer = client.create_producer('my-topic', producer_name='fred', send_timeout_millis=0) for i in range(10): print("Sending %d" % i) producer.send(('Hello-%d' % i).encode('utf-8')) print("Sent %d" % i) time.sleep(3) client.close() ``` 1. What happens if the broker is down, at the time of connection? 2. What happens if the broker is down, at the time of message publication? Answers by experimentation: 1. `pulsar.Client('pulsar://localhost:6650')` raises `Pulsar error: ConnectError` if the broker is not accessible at that time. However: 2. `producer.send` does NOT raise an exception if the broker is down. Rather, in the background, reconnection attempts take place. Debug logs show this: ``` 2019-12-20 16:44:33.016 INFO HandlerBase:129 | [persistent://public/default/my-topic, fred] Schedule reconnection in 0.1 s 2019-12-20 16:44:33.092 INFO ClientConnection:1337 | [127.0.0.1:37290 -> 127.0.0.1:6650] Connection closed 2019-12-20 16:44:33.093 INFO ClientConnection:229 | [127.0.0.1:37290 -> 127.0.0.1:6650] Destroyed connection 2019-12-20 16:44:33.093 INFO ClientConnection:1337 | [127.0.0.1:37300 -> 127.0.0.1:6650] Connection closed 2019-12-20 16:44:33.094 INFO ClientConnection:229 | [127.0.0.1:37300 -> 127.0.0.1:6650] Destroyed connection 2019-12-20 16:44:33.116 INFO HandlerBase:52 | [persistent://public/default/my-topic, fred] Getting connection from pool 2019-12-20 16:44:33.117 INFO ConnectionPool:62 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: -1 @ 0 2019-12-20 16:44:33.117 INFO ConnectionPool:72 | Created connection for pulsar://localhost:6650 2019-12-20 16:44:33.124 ERROR ClientConnection:374 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused 2019-12-20 16:44:33.124 INFO ClientConnection:1337 | [<none> -> pulsar://localhost:6650] Connection closed ... at increasing intervals ``` Since this is `send` rather than `send_async`, it blocks until the broker comes back up. Then: suppose I set `send_timeout_millis=10000`. What happens if the message can't be sent within that time? (Answer by experiment: `producer.send` raises `Pulsar error: TimeOut`) What about with `send_async`? (Answer: callback function is invoked with `_pulsar.Result.Timeout`). Are there any other situations in which `send` or `send_async` can raise an exception? I don't know. Therefore I don't know what I might have to catch. **Describe the solution you'd like** Each API method to have its semantics documented, including the exceptions it may raise. In the specific example above: the documentation for [pulsar.Client.create_producer](https://pulsar.apache.org/api/python/#pulsar.Client.create_producer) should indicate which exception is raised if the broker is not reachable; [pulsar.Producer.send](https://pulsar.apache.org/api/python/#pulsar.Producer.send) that it *won't* raise an exception if the broker is down, but can raise a timeout exception if the message could not be delivered within the send timeout. Knowing the API contract is crucial to writing robust applications. Documenting the behaviour is also a safety net against this behaviour being changed unexpectedly. **Describe alternatives you've considered** Trial and error. **Additional context** For comparison, the confluent_kafka [documentation](https://docs.confluent.io/current/clients/confluent-kafka-python/index.html#confluent_kafka.Producer.produce) includes a "Raises:" section under each call which can return an exception, stating which exceptions might occur. Incidentally, the connect behaviour described above is different to Kafka. With the confluent_kakfa library, the Producer can be created even when the broker is down, and the program will attempt to connect in the background. ``` from confluent_kafka import Producer import time producer = Producer({"bootstrap.servers": "localhost"}) # NO EXCEPTION if broker is down def delivery_report(err, msg): print("%r %r" % (err,msg)) for i in range(10): producer.poll(0) print("Sending %d" % i) producer.produce('my-topic', ("Hello-%d" % i).encode('utf-8'), callback=delivery_report) print("Sent %d" % i) time.sleep(3) producer.flush() ``` Arguably this is more consistent than Pulsar: Kafka API will always perform connections for you, but Pulsar API in some situations will (re)connect in the background, and in other situations will fail. This means it's up to the user to implement backoff and reconnection strategies. However, the exact behaviour is less important than documenting what it is, since at least then the user knows what is expected of them. The Kafka documentation isn't perfect either: it doesn't say explicitly whether the callback is invoked in the same thread as the caller (answer: it is, unlike Pulsar). But it does say that the callback will be called from [Producer.poll](https://docs.confluent.io/current/clients/confluent-kafka-python/index.html#confluent_kafka.Producer.poll), which implies it happens in the same thread of execution.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
