Re: [Article] Pull vs. Push-Based Loop Fusion in Query Engines

Joshua Shinavier Wed, 24 Apr 2019 07:49:10 -0700

Cool to see push vs. pull studied in depth. Often, we simply pick one style
and hope for the best. The cited paper
<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.7401&rep=rep1&type=pdf>
on stream fusion is also interesting, and also delves into the nuance of
lazy vs. strict evaluation over streams.


Although I have not used Scala, it has previously occurred to me that it
might be a better fit than Java for a core TP4 reference API. I understand
that Scala supports both lazy and strict evaluation, and the more advanced
support for higher-kinded types and functional pattern matching would be an
advantage for encapsulating side-effects and working with algebraic data
types (see parallel thread).

Josh


On Tue, Apr 23, 2019 at 9:31 AM Marko Rodriguez <[email protected]>
wrote:

> Hello,
>
> I just read this article:
>
> Push vs. Pull-Based Loop Fusion in Query Engines
>         https://arxiv.org/abs/1610.09166 <https://arxiv.org/abs/1610.09166
> >
>
> It is a really good read if you are interested in TP4. Here are some notes
> I jotted down:
>
>         1. Pull-based engines are inefficient when there are lots of
> filters().
>                 - they require a while(predicate.test(next())) which
> introduces branch flow control and subsequent JVM performance issues.
>                 - push-based engines simply don’t emit() if the
> predicate.test() is false. Thus, no branching.
>         2. Pull-based engines are better at limit() based queries.
>                 - they only process what is necessary to satisfy the limit.
>                 - push-based engines will provide more results than needed
> given their eager evaluation strategy (backpressure comes into play).
>         3. We should introduce a "collection()" operator in TP4 for better
> expressivity with list and map manipulation and so we don’t have to use
> unfold()…fold().
>                 - [9,11,13].collection(incr().is(gt(10))) => [12,14]
>                 - the ability to chain functions in a collection
> manipulation sequence is crucial for performance as you don’t create
> intermediate collections.
>         4. Given that some bytecode is best on a push-based vs. a
> pull-based (and vice versa), we can strategize for this accordingly.
>                 - We have Pipes for pull-based.
>                 - We have RxJava for push-based.
>                 - We can even isolate sub-sections of a flow. For instance:
>                         g.V().has(‘age’,gt(10)).out(‘knows').limit(10)
>                                 ==>becomes
>
> g.V().has(‘age’,gt(10)).local(out(‘knows’).limit(10))
>                                 - where the local(bytecode) (TP3-style) is
> executed by Pipes and the root bytecode by rxJava.
>         5. They have lots of good tips for writing JVM performant
> operators/steps/functions.
>                 - All their work is done in Scala.
>
> Enjoy!,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
>

Re: [Article] Pull vs. Push-Based Loop Fusion in Query Engines

Reply via email to