Thanks for the explanation and yes, I was wondering if Pipes could be removed altogether. I was loose with my language and reverted to the TP3 step concept but understand the direction you're going with compilations and I think separating the structure of a query from the execution implementation is a big win. The Pipes implementation will no doubt be faster for execution of a single traversal but I'm wondering if the Flowable RxJava would beat the Pipes processor by providing higher throughput in a scenario when many, many users are executing queries concurrently. Regardless, providers having multiple processor options is a good thing and I don't mean to suggest any premature optimization. At this point, I think I'll put together some simple benchmarks just out of curiosity but will report back.
--Ted On Thu, Apr 4, 2019 at 12:36 PM Marko Rodriguez <[email protected]> wrote: > Hi, > > This is a pretty neat explanation of why Pipes will be faster than RxJava > single-threaded. > > The map-operator for Pipes: > > https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java > < > https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java > > > > The map-operator for RxJava: > > https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java > < > https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java > > > > RxJava has a lot of overhead. Pipes is as bare bones as you can get. > > Marko. > > http://rredux.com <http://rredux.com/> > > > > > > On Apr 4, 2019, at 11:07 AM, Marko Rodriguez <[email protected]> > wrote: > > > > Hello, > > > > Thank you for the response. > > > >> Excellent progress on the the RxJava processor. I was wondering if > >> categories 1 and 2 can be combined where Pipes becomes the Flowable > version > >> of the RxJava processor? > > > > I don’t quite understand your questions. Are you saying: > > > > Flowable.of().flatMap(pipesProcessor) > > > > or are you saying: > > > > “Get rid of Pipes all together and just use single-threaded RxJava > instead." > > > > For the first, I don’t see the benefit of that. For the second, Pipes4 > is really fast! — much faster than Pipes3. (more on this next) > > > > > >> In this case, though single threaded, we'd still > >> get the benefit of asynchronous execution of traversal steps versus > >> blocking execution on thread pools like the current TP3 model. > > > > Again, I’m confused. Apologies. I believe that perhaps you think that > the Step-model of Pipes is what Bytecode gets compiled to in the TP4 VM. If > so, note that this is not the case. The concept of Steps (chained > iterators) is completely within the pipes/ package. The machine-core/ > package compiles Bytecode to a nested List of stateless, unconnected > functions (called a Compilation). It is this intermediate representation > that ultimately is used by Pipes, RxJava, and Beam to create their > respective execution plan (where Pipes does the whole chained iterator step > thing). > > > > Compilation: > https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43 > < > https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43 > > > > > > Pipes: > https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47 > < > https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47 > > > > Beam: > https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132 > < > https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132 > > > > RxJava: > https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103 > < > https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103 > > > > > >> I would > >> imagine Pipes would beat the Flowable performance on a single traversal > >> side-by-side basis (thought perhaps not by much), but the Flowable > version > >> would likely scale up to higher throughput and better CPU utilization > when > >> under concurrent load. > > > > > > Pipes is definitely faster than RxJava (single-threaded). While I only > learned RxJava 36 hours ago, I don’t believe it will ever beat Pipes > because Pipes4 is brain dead simple — much simpler than in TP3 where a > bunch of extra data structures were needed to account for GraphComputer > semantics (e.g. ExpandableIterator). > > > > I believe, given the CPU utilization/etc. points you make, that RxJava > will come into its own in multi-threaded mode (called ParallelFlowable) > when trying to get real-time performance from a query that > touches/generates lots of data (traversers). This is the reason for > Category 2 — real-time, multi-threaded, single machine. I only gave a quick > pass last night at making ParallelFlowable work, but gave up when various > test cases were failing (— I now believe I know the reason why). I hope to > have ParallelFlowable working by mid-week next week and then we can > benchmark its performance. > > > > I hope I answered your questions or at least explained my confusion. > > > > Thanks, > > Marko. > > > > http://rredux.com <http://rredux.com/> > > > > > > > > > >> On Apr 4, 2019, at 10:33 AM, Ted Wilmes <[email protected] <mailto: > [email protected]>> wrote: > >> > >> Hello, > >> > >> > >> --Ted > >> > >> On Tue, Apr 2, 2019 at 7:31 AM Marko Rodriguez <[email protected] > <mailto:[email protected]>> wrote: > >> > >>> Hello, > >>> > >>> TP4 will not make a distinction between STANDARD (OLTP) and COMPUTER > >>> (OLAP) execution models. In TP4, if a processing engine can convert a > >>> bytecode Compilation into a working execution plan then that is all > that > >>> matters. TinkerPop does not need to concern itself with whether that > >>> execution plan is “OLTP" or “OLAP" or with the semantics of its > execution > >>> (function oriented, iterator oriented, RDD-based, etc.). With that, > here > >>> are 4 categories of processors that I believe define the full spectrum > of > >>> what we will be dealing with: > >>> > >>> 1. Real-time single-threaded single-machine. > >>> * This is STANDARD (OLTP) in TP3. > >>> * This is the Pipes processor in TP4. > >>> > >>> 2. Real-time multi-threaded single-machine. > >>> * This does not exist in TP3. > >>> * We should provide an RxJava processor in TP4. > >>> > >>> 3. Near-time distributed multi-machine. > >>> * This does not exist in TP3. > >>> * We should provide an Akka processor in TP4. > >>> > >>> 4. Batch-time distributed multi-machine. > >>> * This is COMPUTER (OLAP) in TP3 (Spark or Giraph). > >>> * We should provide a Spark processor in TP4. > >>> > >>> I’m not familiar with the specifics of the Flink, Apex, DataFlow, > Samza, > >>> etc. stream-based processors. However, I believe they can be made to > work > >>> in near-time or batch-time depending on the amount of data pulled from > the > >>> database. However, once we understand these technologies better, I > believe > >>> we should be able to fit them into the categories above. > >>> > >>> In conclusion: Do these categories make sense to people? > Terminology-wise > >>> -- Near-time? Batch-time? Are these distinctions valid? > >>> > >>> Thank you, > >>> Marko. > >>> > >>> http://rredux.com <http://rredux.com/> <http://rredux.com/ < > http://rredux.com/>> > > > >
