Looking forward to this, I will try to attend Aug 12. If Arrow Flight does not work out, I think Neo4J's Bolt could be a strong alternative to graph binary.
Advantages Bolt has: - No max content length issues - Error handling and recovery built into protocol - Handshaking to provide compatibility across client / server wire protocol versions - Transactions are very well defined within Bolt There are some others, but those come to mind. See https://7687.org/ for more info. This also fits the ideology of building bridges, not walls ( https://ieeexplore.ieee.org/document/9031506) as it aligns our wire protocol with Neo4J's wire protocol. Because of that, we may get good driver re-use out of this. I haven't looked into the licensing of any of this though so that may not be possible. Lyndon On Fri, Aug 5, 2022 at 3:31 PM Valentyn Kahamlyk <valent...@bitquilltech.com.invalid> wrote: > Hello all,I'm hosting in Discord a short demo for proof of concept using > Arrow Flight with Gremlin, using string queries and GraphSON for > serialization. Any questions and comments are welcome. The next step will > be to create the full designs based on the proof of concept.The planned > date is Aug 12, I will follow up with the exact time later. > > On Thu, Jun 30, 2022 at 4:35 PM Valentyn Kahamlyk < > valent...@bitquilltech.com> wrote: > > > Hello Everyone, > > > > I would like to propose exploring options to use Arrow Flight as a > transport for Gremlin Server. Currently Gremlin Server and Clients are > based on WebSockets with a custom sub-protocol and serialization to > GraphSON and GraphBinary. Developers for each driver must implement those > protocols from scratch and there is a limited amount of code which is being > reused (only 3rd party WebSocket libraries are currently reused in the > client variants). The protocol implementation is a complicated and > error-prone process, so most drivers only support some subset of Gremlin > Server features. The maintenance cost is also constantly increasing with > the number of new client variants being added to TinkerPop. > > > > ** Motivation ** > > We would like to propose a solution to reduce maintenance and simplify > the development of the client drivers by using a standard protocol based on > the Apache Arrow Flight. As Arrow Flight is implemented in the most common > languages like C++, C#, Java and Python we anticipate a larger amount of > existing codebase can be reused which would help to reduce maintenance > costs in the future. Also, we can reuse some other Arrow Flight features > like authentication and error handling. > > > > ** Assumptions ** > > Proof of Concept Development will be done with Java 8. > > Need to reuse existing code as much as possible. > > It is desirable, but not necessary, to maintain compatibility with > existing drivers. > > To simplify development at the initial stage, we will reuse existing > serialization mechanisms. > > > > ** Requirements ** > > Gremlin Server and drivers should replace the network layer with Arrow > Flight. > > No significant drop in performance. > > Gremlin Arrow must pass the Gherkin test suite. > > > > ** Prototype Design Overview ** > > We would like to explore solution below and create prototype to prove > approach is feasible. > > The main idea is to replace the transport layer with FlightServer and > FlightClient. They support asynchronous data transfer, splitting data into > chunks, and authorization. While Arrow Flight typically requires schema, in > a short term we can proceed with implementation using existing serializers > and GraphBinary format. By using GraphBinary we will not have all > capabilities that Arrow Flight provides out of the box, like efficient > compression. However, in the future, we see the value of adding > capabilities to generate a schema from the server-side, and that can enable > additional use cases. > > > > First stage: replace transport layer, but keep serializers > > Pros: > > Reduction of the code base to be developed and maintained > > A relatively low number of modifications > > > > Cons: > > We may observe reduced performance due to schema transfer and other > overhead. As part of the PoC we will assess performance overhead for small > and large responses and identify options to mitigate it. > > Still need to support GraphBinary serialization. > > > > Second stage: replace transport layer, make dynamic schema generation > and use native Arrow structures for data transmission > > Pros: > > Greater reduction of the codebase to be developed and maintained > > In addition, need to rework the serialization and add schema generation > > Performance can be improved for large data sets due to Arrow Flight > optimizations and the ability to transfer data in parallel > > No need to support GraphBinary and GraphSON serialization protocols > > > > Cons: > > Reduced performance for small result sets > > Can be complicated and expensive to generate a schema for each request > > > > Please find few more diagrams attached in the pdf file attached and > please share your thoughts. > > > > Regards, Valentyn > > > > > -- *Lyndon Bauto* *Senior Software Engineer* *Aerospike, Inc.* www.aerospike.com lba...@aerospike.com