gianm edited a comment on issue #12262:
URL: https://github.com/apache/druid/issues/12262#issuecomment-1055125017


   > Gian, Are there any algorithms that would be made possible by this 
architecture? (By algorithms, I mean distributed implementations of relational 
operators, e.g. shuffle-sort, shuffle-join, broadcast-join, hybrid-hash-join, 
use of bloom filters.) If so, can you list the algorithms that you have in mind?
   
   @julianhyde I had in mind that a "partition-sort-shuffle-merge" operation 
would be the basic building block. The "execution coordinator" I mentioned 
should understand those, & for each stage should accept parameters like:
   
   - set of input stages (this is what makes it a DAG)
   - which input stages should be broadcast and which should be partitioned
   - shuffle key for output
   - whether the shuffled output needs to be sorted (if not, we can stream 
through the stage; if so, we need a buffer between stages)
   
   With that building block I think the most useful algorithms to implement at 
first would be shuffle sort (trivial given the building block), sort-based 
aggregation, sort-merge join, broadcast hash join, and shuffle hash join.
   
   Btw: I think certain of these algorithms would be hybrid in some way if you 
consider the flow end to end. For example: by "sort-based aggregation" I just 
mean the distributed part. I had imagined the part that happens local to the 
leaf server still being hash-based, followed by a sort-based distributed step.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to