### Motivation 1. Currently, the presto pulsar connector will read synchronously from bookkeeper when it has run out of entries go process. Basically, we process a batch of entries and then we read more. Ideally should be doing reading and processing in parallel to increase throughput.
2. Each split initializes their own ManagedLedgerFactory/Bookkeeper client. We really just need one bookkeeper client to be shared among threads. ### Modifications 1. Rewrote the logic in the Presto Pulsar connector to read async and process in parallel 2. Cache ManagedLedgerFactory to be used across splits ### Result I see about 2X throughput improvement on single node as well as cluster (2 brokers, 3 bookies, 4 presto workers including coordinator) on AWS [ Full content available at: https://github.com/apache/incubator-pulsar/pull/2564 ] This message was relayed via gitbox.apache.org for [email protected]
