[Article] Pull vs. Push-Based Loop Fusion in Query Engines

Marko Rodriguez Tue, 23 Apr 2019 09:32:13 -0700

Hello,

I just read this article:


Push vs. Pull-Based Loop Fusion in Query Engines
        https://arxiv.org/abs/1610.09166 <https://arxiv.org/abs/1610.09166>

It is a really good read if you are interested in TP4. Here are some notes I 
jotted down:

        1. Pull-based engines are inefficient when there are lots of filters().
                - they require a while(predicate.test(next())) which introduces 
branch flow control and subsequent JVM performance issues.
                - push-based engines simply don’t emit() if the 
predicate.test() is false. Thus, no branching.
        2. Pull-based engines are better at limit() based queries.
                - they only process what is necessary to satisfy the limit.
                - push-based engines will provide more results than needed 
given their eager evaluation strategy (backpressure comes into play).
        3. We should introduce a "collection()" operator in TP4 for better 
expressivity with list and map manipulation and so we don’t have to use 
unfold()…fold().
                - [9,11,13].collection(incr().is(gt(10))) => [12,14]
                - the ability to chain functions in a collection manipulation 
sequence is crucial for performance as you don’t create intermediate 
collections.
        4. Given that some bytecode is best on a push-based vs. a pull-based 
(and vice versa), we can strategize for this accordingly.
                - We have Pipes for pull-based.
                - We have RxJava for push-based.
                - We can even isolate sub-sections of a flow. For instance:
                        g.V().has(‘age’,gt(10)).out(‘knows').limit(10)
                                ==>becomes
                        g.V().has(‘age’,gt(10)).local(out(‘knows’).limit(10))
                                - where the local(bytecode) (TP3-style) is 
executed by Pipes and the root bytecode by rxJava.
        5. They have lots of good tips for writing JVM performant 
operators/steps/functions.
                - All their work is done in Scala.

Enjoy!,
Marko.

http://rredux.com <http://rredux.com/>

[Article] Pull vs. Push-Based Loop Fusion in Query Engines

Reply via email to