### Motivation

1. Currently, the presto pulsar connector will read synchronously from 
bookkeeper when it has run out of entries go process.  Basically, we process a 
batch of entries and then we read more.  Ideally should be doing reading and 
processing in parallel to increase throughput.

2. Each split initializes their own ManagedLedgerFactory/Bookkeeper client.  We 
really just need one bookkeeper client to be shared among threads.

### Modifications
1. Rewrote the logic in the Presto Pulsar connector to read async and process 
in parallel

2. Cache ManagedLedgerFactory to be used across splits

### Result

I see about 2X throughput improvement on single node as well as cluster (2 
brokers, 3 bookies, 4 presto workers including coordinator) on AWS

[ Full content available at: 
https://github.com/apache/incubator-pulsar/pull/2564 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to