Looking forward to this, I will try to attend Aug 12.

If Arrow Flight does not work out, I think Neo4J's Bolt could be a strong
alternative to graph binary.

Advantages Bolt has:
- No max content length issues
- Error handling and recovery built into protocol
- Handshaking to provide compatibility across client / server wire protocol
versions
- Transactions are very well defined within Bolt

There are some others, but those come to mind. See https://7687.org/ for
more info.

This also fits the ideology of building bridges, not walls (
https://ieeexplore.ieee.org/document/9031506) as it aligns our wire
protocol with Neo4J's wire protocol.

Because of that, we may get good driver re-use out of this. I haven't
looked into the licensing of any of this though so that may not be possible.

Lyndon

On Fri, Aug 5, 2022 at 3:31 PM Valentyn Kahamlyk
<valent...@bitquilltech.com.invalid> wrote:

> Hello all,I'm hosting in Discord a short demo for proof of concept using
> Arrow Flight with Gremlin, using string queries and GraphSON for
> serialization. Any questions and comments are welcome. The next step will
> be to create the full designs based on the proof of concept.The planned
> date is Aug 12, I will follow up with the exact time later.
>
> On Thu, Jun 30, 2022 at 4:35 PM Valentyn Kahamlyk <
> valent...@bitquilltech.com> wrote:
>
> > Hello Everyone,
> >
> > I would like to propose exploring options to use Arrow Flight as a
> transport for Gremlin Server. Currently Gremlin Server and Clients are
> based on WebSockets with a custom sub-protocol and serialization to
> GraphSON and GraphBinary.  Developers for each driver must implement those
> protocols from scratch and there is a limited amount of code which is being
> reused (only 3rd party WebSocket libraries are currently reused in the
> client variants). The protocol implementation is a complicated and
> error-prone process, so most drivers only support some subset of Gremlin
> Server features. The maintenance cost is also constantly increasing with
> the number of new client variants being added to TinkerPop.
> >
> > ** Motivation **
> > We would like to propose a solution to reduce maintenance and simplify
> the development of the client drivers by using a standard protocol based on
> the Apache Arrow Flight. As Arrow Flight is implemented in the most common
> languages like C++, C#, Java and Python we anticipate a larger amount of
> existing codebase can be reused which would help to reduce maintenance
> costs in the future. Also, we can reuse some other Arrow Flight features
> like authentication and error handling.
> >
> > ** Assumptions **
> > Proof of Concept Development will be done with Java 8.
> > Need to reuse existing code as much as possible.
> > It is desirable, but not necessary, to maintain compatibility with
> existing drivers.
> > To simplify development at the initial stage, we will reuse existing
> serialization mechanisms.
> >
> > ** Requirements **
> > Gremlin Server and drivers should replace the network layer with Arrow
> Flight.
> > No significant drop in performance.
> > Gremlin Arrow must pass the Gherkin test suite.
> >
> > ** Prototype Design Overview **
> > We would like to explore solution below and create prototype to prove
> approach is feasible.
> > The main idea is to replace the transport layer with FlightServer and
> FlightClient. They support asynchronous data transfer, splitting data into
> chunks, and authorization. While Arrow Flight typically requires schema, in
> a short term we can proceed with implementation using existing serializers
> and GraphBinary format. By using GraphBinary we will not have all
> capabilities that Arrow Flight provides out of the box, like efficient
> compression. However, in the future, we see the value of adding
> capabilities to generate a schema from the server-side, and that can enable
> additional use cases.
> >
> > First stage: replace transport layer, but keep serializers
> > Pros:
> > Reduction of the code base to be developed and maintained
> > A relatively low number of modifications
> >
> > Cons:
> > We may observe reduced performance due to schema transfer and other
> overhead. As part of the PoC we will assess performance overhead for small
> and large responses and identify options to mitigate it.
> > Still need to support GraphBinary serialization.
> >
> > Second stage: replace transport layer, make dynamic schema generation
> and use native Arrow structures for data transmission
> > Pros:
> > Greater reduction of the codebase to be developed and maintained
> > In addition, need to rework the serialization and add schema generation
> > Performance can be improved for large data sets due to Arrow Flight
> optimizations and the ability to transfer data in parallel
> > No need to support GraphBinary and GraphSON serialization protocols
> >
> > Cons:
> > Reduced performance for small result sets
> > Can be complicated and expensive to generate a schema for each request
> >
> > Please find few more diagrams attached in the pdf file attached and
> please share your thoughts.
> >
> > Regards, Valentyn
> >
> >
>


-- 

*Lyndon Bauto*
*Senior Software Engineer*
*Aerospike, Inc.*
www.aerospike.com
lba...@aerospike.com

Reply via email to