Hello Joshua, Thanks for the quick feedback! I think we can make future support easier by removing WebSocket if Arrow Flight does the job.
Best regards, Valentyn On Thu, Jun 30, 2022 at 5:14 PM Joshua Shinavier <j...@fortytwo.net> wrote: > Hi Valentyn, > > Thank you for the proposal/summary. Leo Meyerovich and others have > previously suggested adding Arrow support to TinkerPop; it just hasn't been > prioritized. I like everything about your description apart from this > phrase: "should replace the network layer with Arrow Flight". You are not > suggesting that the WebSocket-based solution be removed, are you? If the > two could exist in parallel, it definitely would be nice to have an Arrow > option. WebSocket could perhaps be dropped later if it isn't being used > much and/or the maintenance burden is too high. Just my $0.02. > > Josh > > > > On Thu, Jun 30, 2022 at 4:36 PM Valentyn Kahamlyk > <valent...@bitquilltech.com.invalid> wrote: > > > Hello Everyone, > > > > I would like to propose exploring options to use Arrow Flight as a > transport for Gremlin Server. Currently Gremlin Server and Clients are > based on WebSockets with a custom sub-protocol and serialization to > GraphSON and GraphBinary. Developers for each driver must implement those > protocols from scratch and there is a limited amount of code which is being > reused (only 3rd party WebSocket libraries are currently reused in the > client variants). The protocol implementation is a complicated and > error-prone process, so most drivers only support some subset of Gremlin > Server features. The maintenance cost is also constantly increasing with > the number of new client variants being added to TinkerPop. > > > > ** Motivation ** > > We would like to propose a solution to reduce maintenance and simplify > the development of the client drivers by using a standard protocol based on > the Apache Arrow Flight. As Arrow Flight is implemented in the most common > languages like C++, C#, Java and Python we anticipate a larger amount of > existing codebase can be reused which would help to reduce maintenance > costs in the future. Also, we can reuse some other Arrow Flight features > like authentication and error handling. > > > > ** Assumptions ** > > Proof of Concept Development will be done with Java 8. > > Need to reuse existing code as much as possible. > > It is desirable, but not necessary, to maintain compatibility with > existing drivers. > > To simplify development at the initial stage, we will reuse existing > serialization mechanisms. > > > > ** Requirements ** > > Gremlin Server and drivers should replace the network layer with Arrow > Flight. > > No significant drop in performance. > > Gremlin Arrow must pass the Gherkin test suite. > > > > ** Prototype Design Overview ** > > We would like to explore solution below and create prototype to prove > approach is feasible. > > The main idea is to replace the transport layer with FlightServer and > FlightClient. They support asynchronous data transfer, splitting data into > chunks, and authorization. While Arrow Flight typically requires schema, in > a short term we can proceed with implementation using existing serializers > and GraphBinary format. By using GraphBinary we will not have all > capabilities that Arrow Flight provides out of the box, like efficient > compression. However, in the future, we see the value of adding > capabilities to generate a schema from the server-side, and that can enable > additional use cases. > > > > First stage: replace transport layer, but keep serializers > > Pros: > > Reduction of the code base to be developed and maintained > > A relatively low number of modifications > > > > Cons: > > We may observe reduced performance due to schema transfer and other > overhead. As part of the PoC we will assess performance overhead for small > and large responses and identify options to mitigate it. > > Still need to support GraphBinary serialization. > > > > Second stage: replace transport layer, make dynamic schema generation > and use native Arrow structures for data transmission > > Pros: > > Greater reduction of the codebase to be developed and maintained > > In addition, need to rework the serialization and add schema generation > > Performance can be improved for large data sets due to Arrow Flight > optimizations and the ability to transfer data in parallel > > No need to support GraphBinary and GraphSON serialization protocols > > > > Cons: > > Reduced performance for small result sets > > Can be complicated and expensive to generate a schema for each request > > > > Please find few more diagrams attached in the pdf file attached and > please share your thoughts. > > > > Regards, Valentyn > > > > >