Cool to see push vs. pull studied in depth. Often, we simply pick one style and hope for the best. The cited paper <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.7401&rep=rep1&type=pdf> on stream fusion is also interesting, and also delves into the nuance of lazy vs. strict evaluation over streams.
Although I have not used Scala, it has previously occurred to me that it might be a better fit than Java for a core TP4 reference API. I understand that Scala supports both lazy and strict evaluation, and the more advanced support for higher-kinded types and functional pattern matching would be an advantage for encapsulating side-effects and working with algebraic data types (see parallel thread). Josh On Tue, Apr 23, 2019 at 9:31 AM Marko Rodriguez <[email protected]> wrote: > Hello, > > I just read this article: > > Push vs. Pull-Based Loop Fusion in Query Engines > https://arxiv.org/abs/1610.09166 <https://arxiv.org/abs/1610.09166 > > > > It is a really good read if you are interested in TP4. Here are some notes > I jotted down: > > 1. Pull-based engines are inefficient when there are lots of > filters(). > - they require a while(predicate.test(next())) which > introduces branch flow control and subsequent JVM performance issues. > - push-based engines simply don’t emit() if the > predicate.test() is false. Thus, no branching. > 2. Pull-based engines are better at limit() based queries. > - they only process what is necessary to satisfy the limit. > - push-based engines will provide more results than needed > given their eager evaluation strategy (backpressure comes into play). > 3. We should introduce a "collection()" operator in TP4 for better > expressivity with list and map manipulation and so we don’t have to use > unfold()…fold(). > - [9,11,13].collection(incr().is(gt(10))) => [12,14] > - the ability to chain functions in a collection > manipulation sequence is crucial for performance as you don’t create > intermediate collections. > 4. Given that some bytecode is best on a push-based vs. a > pull-based (and vice versa), we can strategize for this accordingly. > - We have Pipes for pull-based. > - We have RxJava for push-based. > - We can even isolate sub-sections of a flow. For instance: > g.V().has(‘age’,gt(10)).out(‘knows').limit(10) > ==>becomes > > g.V().has(‘age’,gt(10)).local(out(‘knows’).limit(10)) > - where the local(bytecode) (TP3-style) is > executed by Pipes and the root bytecode by rxJava. > 5. They have lots of good tips for writing JVM performant > operators/steps/functions. > - All their work is done in Scala. > > Enjoy!, > Marko. > > http://rredux.com <http://rredux.com/> > > > > >
