agavra commented on issue #9615: URL: https://github.com/apache/pinot/issues/9615#issuecomment-1284347711
Adding notes from offline discussion with @walterddr @61yao @siddharthteotia. Feel free to add additional details if I missed anything. Design notes: 1. the shared buffer itself should have a notion of fairness to make sure that it doesn't fill up with data all from a single query 2. the operator chain task pool should wake up on either (a) new incoming data for the corresponding mailbox or (b) a new operator chain is registered for execution to make sure that there is no race between registering a new operator chain and receiving data from an upstream sending node 3. the shared buffer should have configurable limits, ideally on both data size as well as number of blocks Discussion around out-of-scope considerations: 1. retrying failed operator chains/queries (note: cascading a failure is in scope and should leverage [Query Preemption](https://docs.google.com/document/d/1Z9DYAfKznHQI9Wn8BjTWZYTcNRVGiPP0B8aEP3w_1jQ/edit?pli=1)) 2. implementing parallelism within an operator-chain 3. implementing pipelining or partition-level parallelism of operator chains 4. for v1, the operator chain scheduler will be round-robin. it should be pluggable to support priority scheduling that can ensure fairness but those implementations are out of scope Next steps, @agavra to come up with a PEP document with more implementation and design details after some prototyping. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
