Re: Feature: Support gremlin adapter

柳尘 Mon, 25 Dec 2023 16:00:33 -0800

Thank you for your reply. You are welcome to comment and ask questions
about my proposal.


Forward Xu <[email protected]> 于2023年12月25日周一 09:51写道：

> hi
>
> This is a great feature to extend calcite from regular data queries to
> graph queries (calculations),
> +1 for looking forward to it.
>
> forwardxu
>
> 柳尘 <[email protected]> 于2023年12月24日周日 11:20写道：
>
> > Motivation
> >
> > Hi, community. Currently, more and more users are using some graph
> > databases, such as JanusGraph, HugeGraph, etc.
> > To do some relationship representation of personnel social networks,
> > It is used to count the activity of each user in the social network. Most
> > graph databases are in the graph building and graph traversal stage.
> > All will be implemented using Gremlin syntax.
> > However, this is very unfriendly to users who are not familiar with
> gremlin
> > syntax. While calcite exists as a query framework,
> > It also provides an adapter interface to adapt to different database
> > dialects, such as parsing, relational algebra conversion, and query plan
> > binding.
> > Our company has solved the problem of adapting various graph databases.
> > This is my warehouse:
> https://github.com/kaori-seasons/calcite-gremlin-sql
> >
> >
> > Background
> >
> > Calcite itself supports the database language expansion of the adapter,
> > which enables users to understand the meaning of the grammar.
> > Complete the simplification of the dialect. For example, expand SqlNode
> to
> > implement syntax analysis, and expand RelNode to implement logical plan
> > mapping.
> >
> > thinkerpop is an adaptation framework for various graph databases. In
> this
> > framework, gremlin syntax is mentioned for the first time.
> > It has now become a universal query layer for graph databases. While
> > expanding query statements through calcite’s adapter interface,
> > We will also use thinkerpop's universal graph database API to provide
> > dialect compatibility for different graph databases.
> >
> > Give a simple example:
> > From
> >
> > SELECT "key" FROM inttype
> >
> >  maps to
> >
> >
> >
> g.V().hasLabel("inttype").group().unfold().select(Column.values).order().by(_.unfold().id()).project("inttype").
> >
> >
> by(.project("key").by(.unfold().choose(.has("key"),.values("key"),_.constant("\$%#NULL#
> > %\$"))))
> >
> >
> >
> >
> >
> > The design architecture is divided into three layers.
> >
> > Analytical syntax layer, relational algebra transformation layer, logical
> > plan binding layer.
> >
> > Parsing syntax layer: In the parsing query statement phase, fields and
> > equivalent conditions are split and converted into points and edges.
> >
> > Relational algebra layer: Split it into a collection of points and edges,
> > and convert it into a TableScan during the aggregation/sorting/query
> stage
> > where calcite abstracts it.
> > It is convenient to generate query plans based on conditions and field
> > information.
> > Connection scanning/single table filtering and other methods can be used
> to
> > traverse from any edge/any starting point in the graph
> >
> > Logical plan binding layer: Bind behaviors such as connection
> > scanning/single table filtering/projection to calcite’s planner to
> generate
> > query plans.
> >
> > The process of generating Gremlin logical plan using select statement:
> >
> > 1. First of all, all graph databases start from a source point to build
> the
> > graph. We will use the GraphTraversalSource provided by thinkerpop.
> > As the origin, extract the syntax of the incoming point and side
> > information. This step will be implemented in SqlSchemaGrabber
> > 2. Secondly, for select/where/having/order by/group by our plan in the
> > parsing phase is as follows:
> >
> >   - group by: for a point. There are out-degree and in-degree
> > attributes in the graph. From the perspective of the data table, it is
> > equivalent to converting the table data into IN or OUT.
> > These two dimensions are aggregated. This behavior also corresponds to
> > the table traversal graph operation. During the graph traversal
> > process, fold/unfold tags will be generated, representing the
> > direction of graph traversal.
> >   - select: For the select operation, the operation of scanning the
> > entire table can be regarded as projecting all columns into point
> > attributes. The value changes of each column are mapped to the gremlin
> > operation of adding points.
> >   - where: The filter operation is used in graph computing semantic
> > scenarios. It can be regarded as the edges connected by the out-degree
> > and in-degree of the filter point, so it does not involve the
> > conversion of relational algebra.
> >   Instead, it is pushed directly to the logical plan.
> >   - order by: In the process of graph traversal, we mentioned that
> > fold/unflod will be generated on the path to represent the
> > forward/backward direction.
> >   If a field is encountered and there is no value that can be sorted,
> > we will fall back to the origin of GraphTraversalSource and end the
> > sorting operation.
> >   If there are values that can be sorted, they will be unified in
> > SqlTraversalEngine, in-degree and out-degree will be counted
> > separately for aggregation, and then used with group by according to
> > label (IN/OUT)
> >   - having: The meaning is the same as group by, but the label is
> > different (in addition to the IN/OUT columns, specific point fields
> > need to be specified)
> >
> >  Currently, I have only completed unit tests that translate from SQL to
> > Gremlin execution plan, among which test cases for group by and where are
> > to be added. In addition, I will also use mainstream graph databases such
> > as neo4j and JanusGraph as examples to write sound integration tests. ,
> > ensuring that the API of the graph database is successfully called after
> > converting the sql request into the gremlin execution plan.
> >
> > Finally, community members are welcome to give suggestions.
> >
>

Re: Feature: Support gremlin adapter

Reply via email to