gianm commented on issue #12262:
URL: https://github.com/apache/druid/issues/12262#issuecomment-1055125017


   > Gian, Are there any algorithms that would be made possible by this 
architecture? (By algorithms, I mean distributed implementations of relational 
operators, e.g. shuffle-sort, shuffle-join, broadcast-join, hybrid-hash-join, 
use of bloom filters.) If so, can you list the algorithms that you have in mind?
   
   @julianhyde I had in mind that a "partition-sort-shuffle-merge" operation 
would be the basic building block. The "execution coordinator" I mentioned 
should understand those, & for each stage should accept parameters like:
   
   - set of input stages
   - which input stages should be broadcast and which should be partitioned
   - shuffle key for output
   - whether the shuffled output needs to be sorted (if not, we can stream 
through the stage; if so, we need a buffer between stages)
   
   With that building block I think the most useful algorithms to implement at 
first would be shuffle sort (trivial given the building block), sort-based 
aggregation, sort-merge join, broadcast hash join, and shuffle hash join.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to