Hello,
Thank you for the response.
> Excellent progress on the the RxJava processor. I was wondering if
> categories 1 and 2 can be combined where Pipes becomes the Flowable version
> of the RxJava processor?
I don’t quite understand your questions. Are you saying:
Flowable.of().flatMap(pipesProcessor)
or are you saying:
“Get rid of Pipes all together and just use single-threaded RxJava
instead."
For the first, I don’t see the benefit of that. For the second, Pipes4 is
really fast! — much faster than Pipes3. (more on this next)
> In this case, though single threaded, we'd still
> get the benefit of asynchronous execution of traversal steps versus
> blocking execution on thread pools like the current TP3 model.
Again, I’m confused. Apologies. I believe that perhaps you think that the
Step-model of Pipes is what Bytecode gets compiled to in the TP4 VM. If so,
note that this is not the case. The concept of Steps (chained iterators) is
completely within the pipes/ package. The machine-core/ package compiles
Bytecode to a nested List of stateless, unconnected functions (called a
Compilation). It is this intermediate representation that ultimately is used by
Pipes, RxJava, and Beam to create their respective execution plan (where Pipes
does the whole chained iterator step thing).
Compilation:
https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43>
Pipes:
https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47>
Beam:
https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132>
RxJava:
https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103>
> I would
> imagine Pipes would beat the Flowable performance on a single traversal
> side-by-side basis (thought perhaps not by much), but the Flowable version
> would likely scale up to higher throughput and better CPU utilization when
> under concurrent load.
Pipes is definitely faster than RxJava (single-threaded). While I only learned
RxJava 36 hours ago, I don’t believe it will ever beat Pipes because Pipes4 is
brain dead simple — much simpler than in TP3 where a bunch of extra data
structures were needed to account for GraphComputer semantics (e.g.
ExpandableIterator).
I believe, given the CPU utilization/etc. points you make, that RxJava will
come into its own in multi-threaded mode (called ParallelFlowable) when trying
to get real-time performance from a query that touches/generates lots of data
(traversers). This is the reason for Category 2 — real-time, multi-threaded,
single machine. I only gave a quick pass last night at making ParallelFlowable
work, but gave up when various test cases were failing (— I now believe I know
the reason why). I hope to have ParallelFlowable working by mid-week next week
and then we can benchmark its performance.
I hope I answered your questions or at least explained my confusion.
Thanks,
Marko.
http://rredux.com <http://rredux.com/>
> On Apr 4, 2019, at 10:33 AM, Ted Wilmes <[email protected]> wrote:
>
> Hello,
>
>
> --Ted
>
> On Tue, Apr 2, 2019 at 7:31 AM Marko Rodriguez <[email protected]
> <mailto:[email protected]>> wrote:
>
>> Hello,
>>
>> TP4 will not make a distinction between STANDARD (OLTP) and COMPUTER
>> (OLAP) execution models. In TP4, if a processing engine can convert a
>> bytecode Compilation into a working execution plan then that is all that
>> matters. TinkerPop does not need to concern itself with whether that
>> execution plan is “OLTP" or “OLAP" or with the semantics of its execution
>> (function oriented, iterator oriented, RDD-based, etc.). With that, here
>> are 4 categories of processors that I believe define the full spectrum of
>> what we will be dealing with:
>>
>> 1. Real-time single-threaded single-machine.
>> * This is STANDARD (OLTP) in TP3.
>> * This is the Pipes processor in TP4.
>>
>> 2. Real-time multi-threaded single-machine.
>> * This does not exist in TP3.
>> * We should provide an RxJava processor in TP4.
>>
>> 3. Near-time distributed multi-machine.
>> * This does not exist in TP3.
>> * We should provide an Akka processor in TP4.
>>
>> 4. Batch-time distributed multi-machine.
>> * This is COMPUTER (OLAP) in TP3 (Spark or Giraph).
>> * We should provide a Spark processor in TP4.
>>
>> I’m not familiar with the specifics of the Flink, Apex, DataFlow, Samza,
>> etc. stream-based processors. However, I believe they can be made to work
>> in near-time or batch-time depending on the amount of data pulled from the
>> database. However, once we understand these technologies better, I believe
>> we should be able to fit them into the categories above.
>>
>> In conclusion: Do these categories make sense to people? Terminology-wise
>> -- Near-time? Batch-time? Are these distinctions valid?
>>
>> Thank you,
>> Marko.
>>
>> http://rredux.com <http://rredux.com/> <http://rredux.com/
>> <http://rredux.com/>>