Hi,
This is a pretty neat explanation of why Pipes will be faster than RxJava
single-threaded.
The map-operator for Pipes:
https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java>
The map-operator for RxJava:
https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java
<https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java>
RxJava has a lot of overhead. Pipes is as bare bones as you can get.
Marko.
http://rredux.com <http://rredux.com/>
> On Apr 4, 2019, at 11:07 AM, Marko Rodriguez <[email protected]> wrote:
>
> Hello,
>
> Thank you for the response.
>
>> Excellent progress on the the RxJava processor. I was wondering if
>> categories 1 and 2 can be combined where Pipes becomes the Flowable version
>> of the RxJava processor?
>
> I don’t quite understand your questions. Are you saying:
>
> Flowable.of().flatMap(pipesProcessor)
>
> or are you saying:
>
> “Get rid of Pipes all together and just use single-threaded RxJava
> instead."
>
> For the first, I don’t see the benefit of that. For the second, Pipes4 is
> really fast! — much faster than Pipes3. (more on this next)
>
>
>> In this case, though single threaded, we'd still
>> get the benefit of asynchronous execution of traversal steps versus
>> blocking execution on thread pools like the current TP3 model.
>
> Again, I’m confused. Apologies. I believe that perhaps you think that the
> Step-model of Pipes is what Bytecode gets compiled to in the TP4 VM. If so,
> note that this is not the case. The concept of Steps (chained iterators) is
> completely within the pipes/ package. The machine-core/ package compiles
> Bytecode to a nested List of stateless, unconnected functions (called a
> Compilation). It is this intermediate representation that ultimately is used
> by Pipes, RxJava, and Beam to create their respective execution plan (where
> Pipes does the whole chained iterator step thing).
>
> Compilation:
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43
>
> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43>
>
> Pipes:
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47
>
> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47>
> Beam:
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132
>
> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132>
> RxJava:
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103
>
> <https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103>
>
>> I would
>> imagine Pipes would beat the Flowable performance on a single traversal
>> side-by-side basis (thought perhaps not by much), but the Flowable version
>> would likely scale up to higher throughput and better CPU utilization when
>> under concurrent load.
>
>
> Pipes is definitely faster than RxJava (single-threaded). While I only
> learned RxJava 36 hours ago, I don’t believe it will ever beat Pipes because
> Pipes4 is brain dead simple — much simpler than in TP3 where a bunch of extra
> data structures were needed to account for GraphComputer semantics (e.g.
> ExpandableIterator).
>
> I believe, given the CPU utilization/etc. points you make, that RxJava will
> come into its own in multi-threaded mode (called ParallelFlowable) when
> trying to get real-time performance from a query that touches/generates lots
> of data (traversers). This is the reason for Category 2 — real-time,
> multi-threaded, single machine. I only gave a quick pass last night at making
> ParallelFlowable work, but gave up when various test cases were failing (— I
> now believe I know the reason why). I hope to have ParallelFlowable working
> by mid-week next week and then we can benchmark its performance.
>
> I hope I answered your questions or at least explained my confusion.
>
> Thanks,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
>> On Apr 4, 2019, at 10:33 AM, Ted Wilmes <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hello,
>>
>>
>> --Ted
>>
>> On Tue, Apr 2, 2019 at 7:31 AM Marko Rodriguez <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>> Hello,
>>>
>>> TP4 will not make a distinction between STANDARD (OLTP) and COMPUTER
>>> (OLAP) execution models. In TP4, if a processing engine can convert a
>>> bytecode Compilation into a working execution plan then that is all that
>>> matters. TinkerPop does not need to concern itself with whether that
>>> execution plan is “OLTP" or “OLAP" or with the semantics of its execution
>>> (function oriented, iterator oriented, RDD-based, etc.). With that, here
>>> are 4 categories of processors that I believe define the full spectrum of
>>> what we will be dealing with:
>>>
>>> 1. Real-time single-threaded single-machine.
>>> * This is STANDARD (OLTP) in TP3.
>>> * This is the Pipes processor in TP4.
>>>
>>> 2. Real-time multi-threaded single-machine.
>>> * This does not exist in TP3.
>>> * We should provide an RxJava processor in TP4.
>>>
>>> 3. Near-time distributed multi-machine.
>>> * This does not exist in TP3.
>>> * We should provide an Akka processor in TP4.
>>>
>>> 4. Batch-time distributed multi-machine.
>>> * This is COMPUTER (OLAP) in TP3 (Spark or Giraph).
>>> * We should provide a Spark processor in TP4.
>>>
>>> I’m not familiar with the specifics of the Flink, Apex, DataFlow, Samza,
>>> etc. stream-based processors. However, I believe they can be made to work
>>> in near-time or batch-time depending on the amount of data pulled from the
>>> database. However, once we understand these technologies better, I believe
>>> we should be able to fit them into the categories above.
>>>
>>> In conclusion: Do these categories make sense to people? Terminology-wise
>>> -- Near-time? Batch-time? Are these distinctions valid?
>>>
>>> Thank you,
>>> Marko.
>>>
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/
>>> <http://rredux.com/>>
>