Thank you for your reply. You are welcome to comment and ask questions about my proposal.
Forward Xu <[email protected]> 于2023年12月25日周一 09:51写道: > hi > > This is a great feature to extend calcite from regular data queries to > graph queries (calculations), > +1 for looking forward to it. > > forwardxu > > 柳尘 <[email protected]> 于2023年12月24日周日 11:20写道: > > > Motivation > > > > Hi, community. Currently, more and more users are using some graph > > databases, such as JanusGraph, HugeGraph, etc. > > To do some relationship representation of personnel social networks, > > It is used to count the activity of each user in the social network. Most > > graph databases are in the graph building and graph traversal stage. > > All will be implemented using Gremlin syntax. > > However, this is very unfriendly to users who are not familiar with > gremlin > > syntax. While calcite exists as a query framework, > > It also provides an adapter interface to adapt to different database > > dialects, such as parsing, relational algebra conversion, and query plan > > binding. > > Our company has solved the problem of adapting various graph databases. > > This is my warehouse: > https://github.com/kaori-seasons/calcite-gremlin-sql > > > > > > Background > > > > Calcite itself supports the database language expansion of the adapter, > > which enables users to understand the meaning of the grammar. > > Complete the simplification of the dialect. For example, expand SqlNode > to > > implement syntax analysis, and expand RelNode to implement logical plan > > mapping. > > > > thinkerpop is an adaptation framework for various graph databases. In > this > > framework, gremlin syntax is mentioned for the first time. > > It has now become a universal query layer for graph databases. While > > expanding query statements through calcite’s adapter interface, > > We will also use thinkerpop's universal graph database API to provide > > dialect compatibility for different graph databases. > > > > Give a simple example: > > From > > > > SELECT "key" FROM inttype > > > > maps to > > > > > > > g.V().hasLabel("inttype").group().unfold().select(Column.values).order().by(_.unfold().id()).project("inttype"). > > > > > by(.project("key").by(.unfold().choose(.has("key"),.values("key"),_.constant("\$%#NULL# > > %\$")))) > > > > > > > > > > > > The design architecture is divided into three layers. > > > > Analytical syntax layer, relational algebra transformation layer, logical > > plan binding layer. > > > > Parsing syntax layer: In the parsing query statement phase, fields and > > equivalent conditions are split and converted into points and edges. > > > > Relational algebra layer: Split it into a collection of points and edges, > > and convert it into a TableScan during the aggregation/sorting/query > stage > > where calcite abstracts it. > > It is convenient to generate query plans based on conditions and field > > information. > > Connection scanning/single table filtering and other methods can be used > to > > traverse from any edge/any starting point in the graph > > > > Logical plan binding layer: Bind behaviors such as connection > > scanning/single table filtering/projection to calcite’s planner to > generate > > query plans. > > > > The process of generating Gremlin logical plan using select statement: > > > > 1. First of all, all graph databases start from a source point to build > the > > graph. We will use the GraphTraversalSource provided by thinkerpop. > > As the origin, extract the syntax of the incoming point and side > > information. This step will be implemented in SqlSchemaGrabber > > 2. Secondly, for select/where/having/order by/group by our plan in the > > parsing phase is as follows: > > > > - group by: for a point. There are out-degree and in-degree > > attributes in the graph. From the perspective of the data table, it is > > equivalent to converting the table data into IN or OUT. > > These two dimensions are aggregated. This behavior also corresponds to > > the table traversal graph operation. During the graph traversal > > process, fold/unfold tags will be generated, representing the > > direction of graph traversal. > > - select: For the select operation, the operation of scanning the > > entire table can be regarded as projecting all columns into point > > attributes. The value changes of each column are mapped to the gremlin > > operation of adding points. > > - where: The filter operation is used in graph computing semantic > > scenarios. It can be regarded as the edges connected by the out-degree > > and in-degree of the filter point, so it does not involve the > > conversion of relational algebra. > > Instead, it is pushed directly to the logical plan. > > - order by: In the process of graph traversal, we mentioned that > > fold/unflod will be generated on the path to represent the > > forward/backward direction. > > If a field is encountered and there is no value that can be sorted, > > we will fall back to the origin of GraphTraversalSource and end the > > sorting operation. > > If there are values that can be sorted, they will be unified in > > SqlTraversalEngine, in-degree and out-degree will be counted > > separately for aggregation, and then used with group by according to > > label (IN/OUT) > > - having: The meaning is the same as group by, but the label is > > different (in addition to the IN/OUT columns, specific point fields > > need to be specified) > > > > Currently, I have only completed unit tests that translate from SQL to > > Gremlin execution plan, among which test cases for group by and where are > > to be added. In addition, I will also use mainstream graph databases such > > as neo4j and JanusGraph as examples to write sound integration tests. , > > ensuring that the API of the graph database is successfully called after > > converting the sql request into the gremlin execution plan. > > > > Finally, community members are welcome to give suggestions. > > >
