Hello,
I just read this article:
Push vs. Pull-Based Loop Fusion in Query Engines
https://arxiv.org/abs/1610.09166 <https://arxiv.org/abs/1610.09166>
It is a really good read if you are interested in TP4. Here are some notes I
jotted down:
1. Pull-based engines are inefficient when there are lots of filters().
- they require a while(predicate.test(next())) which introduces
branch flow control and subsequent JVM performance issues.
- push-based engines simply don’t emit() if the
predicate.test() is false. Thus, no branching.
2. Pull-based engines are better at limit() based queries.
- they only process what is necessary to satisfy the limit.
- push-based engines will provide more results than needed
given their eager evaluation strategy (backpressure comes into play).
3. We should introduce a "collection()" operator in TP4 for better
expressivity with list and map manipulation and so we don’t have to use
unfold()…fold().
- [9,11,13].collection(incr().is(gt(10))) => [12,14]
- the ability to chain functions in a collection manipulation
sequence is crucial for performance as you don’t create intermediate
collections.
4. Given that some bytecode is best on a push-based vs. a pull-based
(and vice versa), we can strategize for this accordingly.
- We have Pipes for pull-based.
- We have RxJava for push-based.
- We can even isolate sub-sections of a flow. For instance:
g.V().has(‘age’,gt(10)).out(‘knows').limit(10)
==>becomes
g.V().has(‘age’,gt(10)).local(out(‘knows’).limit(10))
- where the local(bytecode) (TP3-style) is
executed by Pipes and the root bytecode by rxJava.
5. They have lots of good tips for writing JVM performant
operators/steps/functions.
- All their work is done in Scala.
Enjoy!,
Marko.
http://rredux.com <http://rredux.com/>