Re: A Question Regarding TP4 Processor Classifications

Ted Wilmes Thu, 04 Apr 2019 11:59:07 -0700

Thanks for the explanation and yes, I was wondering if Pipes could be
removed altogether. I was loose with my language and reverted to the TP3
step concept but understand the direction you're going with compilations
and I think separating the structure of a query from the execution
implementation is a big win. The Pipes implementation will no doubt be
faster for execution of a single traversal but I'm wondering if the
Flowable RxJava would beat the Pipes processor by providing higher
throughput in a scenario when many, many users are executing queries
concurrently. Regardless, providers having multiple processor options is a
good thing and I don't mean to suggest any premature optimization. At this
point, I think I'll put together some simple benchmarks just out of
curiosity but will report back.


--Ted

On Thu, Apr 4, 2019 at 12:36 PM Marko Rodriguez <[email protected]>
wrote:

> Hi,
>
> This is a pretty neat explanation of why Pipes will be faster than RxJava
> single-threaded.
>
> The map-operator for Pipes:
>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java
> >
>
> The map-operator for RxJava:
>
> https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java
> <
> https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java
> >
>
> RxJava has a lot of overhead. Pipes is as bare bones as you can get.
>
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 4, 2019, at 11:07 AM, Marko Rodriguez <[email protected]>
> wrote:
> >
> > Hello,
> >
> > Thank you for the response.
> >
> >> Excellent progress on the the RxJava processor. I was wondering if
> >> categories 1 and 2 can be combined where Pipes becomes the Flowable
> version
> >> of the RxJava processor?
> >
> > I don’t quite understand your questions. Are you saying:
> >
> >       Flowable.of().flatMap(pipesProcessor)
> >
> > or are you saying:
> >
> >       “Get rid of Pipes all together and just use single-threaded RxJava
> instead."
> >
> > For the first, I don’t see the benefit of that. For the second, Pipes4
> is really fast! — much faster than Pipes3. (more on this next)
> >
> >
> >> In this case, though single threaded, we'd still
> >> get the benefit of asynchronous execution of traversal steps versus
> >> blocking execution on thread pools like the current TP3 model.
> >
> > Again, I’m confused. Apologies. I believe that perhaps you think that
> the Step-model of Pipes is what Bytecode gets compiled to in the TP4 VM. If
> so, note that this is not the case. The concept of Steps (chained
> iterators) is completely within the pipes/ package. The machine-core/
> package compiles Bytecode to a nested List of stateless, unconnected
> functions (called a Compilation). It is this intermediate representation
> that ultimately is used by Pipes, RxJava, and Beam to create their
> respective execution plan (where Pipes does the whole chained iterator step
> thing).
> >
> > Compilation:
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43
> >
> >
> >       Pipes:
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47
> >
> >       Beam:
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132
> >
> >       RxJava:
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103
> >
> >
> >> I would
> >> imagine Pipes would beat the Flowable performance on a single traversal
> >> side-by-side basis (thought perhaps not by much), but the Flowable
> version
> >> would likely scale up to higher throughput and better CPU utilization
> when
> >> under concurrent load.
> >
> >
> > Pipes is definitely faster than RxJava (single-threaded). While I only
> learned RxJava 36 hours ago, I don’t believe it will ever beat Pipes
> because Pipes4 is brain dead simple — much simpler than in TP3 where a
> bunch of extra data structures were needed to account for GraphComputer
> semantics (e.g. ExpandableIterator).
> >
> > I believe, given the CPU utilization/etc. points you make, that RxJava
> will come into its own in multi-threaded mode (called ParallelFlowable)
> when trying to get real-time performance from a query that
> touches/generates lots of data (traversers). This is the reason for
> Category 2 — real-time, multi-threaded, single machine. I only gave a quick
> pass last night at making ParallelFlowable work, but gave up when various
> test cases were failing (— I now believe I know the reason why). I hope to
> have ParallelFlowable working by mid-week next week and then we can
> benchmark its performance.
> >
> > I hope I answered your questions or at least explained my confusion.
> >
> > Thanks,
> > Marko.
> >
> > http://rredux.com <http://rredux.com/>
> >
> >
> >
> >
> >> On Apr 4, 2019, at 10:33 AM, Ted Wilmes <[email protected] <mailto:
> [email protected]>> wrote:
> >>
> >> Hello,
> >>
> >>
> >> --Ted
> >>
> >> On Tue, Apr 2, 2019 at 7:31 AM Marko Rodriguez <[email protected]
> <mailto:[email protected]>> wrote:
> >>
> >>> Hello,
> >>>
> >>> TP4 will not make a distinction between STANDARD (OLTP) and COMPUTER
> >>> (OLAP) execution models. In TP4, if a processing engine can convert a
> >>> bytecode Compilation into a working execution plan then that is all
> that
> >>> matters. TinkerPop does not need to concern itself with whether that
> >>> execution plan is “OLTP" or “OLAP" or with the semantics of its
> execution
> >>> (function oriented, iterator oriented, RDD-based, etc.). With that,
> here
> >>> are 4 categories of processors that I believe define the full spectrum
> of
> >>> what we will be dealing with:
> >>>
> >>>        1. Real-time single-threaded single-machine.
> >>>                * This is STANDARD (OLTP) in TP3.
> >>>                * This is the Pipes processor in TP4.
> >>>
> >>>        2. Real-time multi-threaded single-machine.
> >>>                * This does not exist in TP3.
> >>>                * We should provide an RxJava processor in TP4.
> >>>
> >>>        3. Near-time distributed multi-machine.
> >>>                * This does not exist in TP3.
> >>>                * We should provide an Akka processor in TP4.
> >>>
> >>>        4. Batch-time distributed multi-machine.
> >>>                * This is COMPUTER (OLAP) in TP3 (Spark or Giraph).
> >>>                * We should provide a Spark processor in TP4.
> >>>
> >>> I’m not familiar with the specifics of the Flink, Apex, DataFlow,
> Samza,
> >>> etc. stream-based processors. However, I believe they can be made to
> work
> >>> in near-time or batch-time depending on the amount of data pulled from
> the
> >>> database. However, once we understand these technologies better, I
> believe
> >>> we should be able to fit them into the categories above.
> >>>
> >>> In conclusion: Do these categories make sense to people?
> Terminology-wise
> >>> -- Near-time? Batch-time? Are these distinctions valid?
> >>>
> >>> Thank you,
> >>> Marko.
> >>>
> >>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
> >
>
>

Re: A Question Regarding TP4 Processor Classifications

Reply via email to