Hello Joshua,

Thanks for the quick feedback!
I think we can make future support easier by removing WebSocket if Arrow
Flight does the job.

Best regards, Valentyn

On Thu, Jun 30, 2022 at 5:14 PM Joshua Shinavier <j...@fortytwo.net> wrote:

> Hi Valentyn,
>
> Thank you for the proposal/summary. Leo Meyerovich and others have
> previously suggested adding Arrow support to TinkerPop; it just hasn't been
> prioritized. I like everything about your description apart from this
> phrase: "should replace the network layer with Arrow Flight". You are not
> suggesting that the WebSocket-based solution be removed, are you? If the
> two could exist in parallel, it definitely would be nice to have an Arrow
> option. WebSocket could perhaps be dropped later if it isn't being used
> much and/or the maintenance burden is too high. Just my $0.02.
>
> Josh
>
>
>
> On Thu, Jun 30, 2022 at 4:36 PM Valentyn Kahamlyk
> <valent...@bitquilltech.com.invalid> wrote:
>
> > Hello Everyone,
> >
> > I would like to propose exploring options to use Arrow Flight as a
> transport for Gremlin Server. Currently Gremlin Server and Clients are
> based on WebSockets with a custom sub-protocol and serialization to
> GraphSON and GraphBinary.  Developers for each driver must implement those
> protocols from scratch and there is a limited amount of code which is being
> reused (only 3rd party WebSocket libraries are currently reused in the
> client variants). The protocol implementation is a complicated and
> error-prone process, so most drivers only support some subset of Gremlin
> Server features. The maintenance cost is also constantly increasing with
> the number of new client variants being added to TinkerPop.
> >
> > ** Motivation **
> > We would like to propose a solution to reduce maintenance and simplify
> the development of the client drivers by using a standard protocol based on
> the Apache Arrow Flight. As Arrow Flight is implemented in the most common
> languages like C++, C#, Java and Python we anticipate a larger amount of
> existing codebase can be reused which would help to reduce maintenance
> costs in the future. Also, we can reuse some other Arrow Flight features
> like authentication and error handling.
> >
> > ** Assumptions **
> > Proof of Concept Development will be done with Java 8.
> > Need to reuse existing code as much as possible.
> > It is desirable, but not necessary, to maintain compatibility with
> existing drivers.
> > To simplify development at the initial stage, we will reuse existing
> serialization mechanisms.
> >
> > ** Requirements **
> > Gremlin Server and drivers should replace the network layer with Arrow
> Flight.
> > No significant drop in performance.
> > Gremlin Arrow must pass the Gherkin test suite.
> >
> > ** Prototype Design Overview **
> > We would like to explore solution below and create prototype to prove
> approach is feasible.
> > The main idea is to replace the transport layer with FlightServer and
> FlightClient. They support asynchronous data transfer, splitting data into
> chunks, and authorization. While Arrow Flight typically requires schema, in
> a short term we can proceed with implementation using existing serializers
> and GraphBinary format. By using GraphBinary we will not have all
> capabilities that Arrow Flight provides out of the box, like efficient
> compression. However, in the future, we see the value of adding
> capabilities to generate a schema from the server-side, and that can enable
> additional use cases.
> >
> > First stage: replace transport layer, but keep serializers
> > Pros:
> > Reduction of the code base to be developed and maintained
> > A relatively low number of modifications
> >
> > Cons:
> > We may observe reduced performance due to schema transfer and other
> overhead. As part of the PoC we will assess performance overhead for small
> and large responses and identify options to mitigate it.
> > Still need to support GraphBinary serialization.
> >
> > Second stage: replace transport layer, make dynamic schema generation
> and use native Arrow structures for data transmission
> > Pros:
> > Greater reduction of the codebase to be developed and maintained
> > In addition, need to rework the serialization and add schema generation
> > Performance can be improved for large data sets due to Arrow Flight
> optimizations and the ability to transfer data in parallel
> > No need to support GraphBinary and GraphSON serialization protocols
> >
> > Cons:
> > Reduced performance for small result sets
> > Can be complicated and expensive to generate a schema for each request
> >
> > Please find few more diagrams attached in the pdf file attached and
> please share your thoughts.
> >
> > Regards, Valentyn
> >
> >
>

Reply via email to