qingwen220 commented on code in PR #6: URL: https://github.com/apache/geaflow-website/pull/6#discussion_r2427846719
########## blog/28.md: ########## @@ -1,49 +1,49 @@ --- -title: 流图计算之增量match原理与应用 +title: Principles and Applications of Incremental Match in Streaming Graph Computing date: 2025-6-3 ---  -## 问题背景 - -在流式计算中,数据往往不是全部一批到来,而会源源不断地进行输入和计算,在图计算/图查询领域,也存在类似的场景,图的点边不断地从数据源读取,进行构图,从而形成增量图。在增量图查询中,图随时发生着变化,在不同的图版本中,进行图查询的结果也会有所不同。对于某一次新增的点边,构成了一个新的版本的图,如果重新对全图(即当前所有点边)进行图遍历,开销较大,并且也会和历史数据有重复。由于历史的数据已经计算过一遍,理想情况下,只需要对增量所影响的部分进行计算/查询,而不需要对全图重新进行查询。 +## Problem Background +In streaming computing, data rarely arrives all at once but is continuously input and processed. Similarly, in graph computing/graph querying scenarios, vertices and edges are constantly read from data sources to construct graphs incrementally. In incremental graph queries, the graph evolves continuously, leading to different query results across graph versions. When new vertices/edges form an updated graph version, recomputing through the entire graph incurs high overhead and duplicates historical computations. Since historical data has already been processed, ideally only the delta-affected portions should be computed/queried without full-graph re-execution. <!-- truncate --> -<font style="color:rgb(51, 51, 51);">GQL(Graph Query Language)</font><font style="color:rgb(0, 0, 0);">是国际标准化组织(ISO)为标准化图查询语言所制定的一个标准,</font><font style="color:rgb(51, 51, 51);">用于在图上执行查询的语言。Geaflow 是蚂蚁图计算团队开源的流图计算引擎,专注于处理动态变化的图数据,支持大规模、高并发的实时图计算场景。</font>本文将介绍在 Geaflow 引擎中,对增量图使用 GQL 进行增量 Match 的方法,目的尽可能地只对增量的数据进行查询,避免冗余的全量计算。 +<font style="color:rgb(51, 51, 51);">GQL (Graph Query Language)</font> <font style="color:rgb(0, 0, 0);">is an international standard developed by ISO for graph query languages,</font> <font style="color:rgb(51, 51, 51);">used to execute queries on graphs. Geaflow is an open-source streaming graph engine by Ant Group’s graph computing team, specializing in dynamically changing graph data and supporting large-scale, high-concurrency real-time graph computing scenarios.</font> This article introduces Geaflow’s approach to incremental GQL-based Match queries on dynamic graphs, aiming to execute queries solely on delta data while avoiding redundant full computations.  -## 当前问题 - -<font style="color:rgb(0, 0, 0);">Geaflow 引擎基于点中心框架(vertex center),通过迭代的方式,每一轮迭代中,每个点向其他点发送消息,并在下一轮收到消息时进行处理、分析。</font>在 Geaflow 的框架中,GQL 的查询需要从前往后进行 Traversal 遍历走图,即从起始节点开始出发,进行扩散,依次进行点边匹配,直到匹配到所需要的查询 pattern。在动态图里场景,如果只使用当前批次新增的点边触发计算,增量的结果会有缺失,例如下面例子所示。 - -<div style="text-align: center;"> -<img src="https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/23857192/1741576149930-b169b7da-0600-4fca-b6ad-5eadcfdbff5b.jpeg" alt='画板' height="281" width="486"></div> - -如上问题关键在于如果只考虑增量的部分,则点 A1 无法触发计算,但是点 A1 实际包含于增量结果中。所以需要设法让点 A1 参与计算,我们考虑一种从新增点扩充子图的方法,将 a 触发。将整个查询分为 2 个阶段,Evolve 扩展阶段和 Traversal 阶段。在 Evolve 阶段中,从起始点开始,向邻居发送 EvolveMessage,后续的 iteration 中,收到 EvolveMessage 的点加入到 EvolveVertices 集合中。而后的 Traversal 阶段则会使用 EvolveVertices 里的点触发遍历,即表示当前窗口的触发点。 +## Current Challenges +<font style="color:rgb(0, 0, 0);">The Geaflow engine adopts a vertex-centric framework, where each vertex sends messages iteratively. Vertices process received messages in subsequent iterations.</font> For GQL queries, traversal starts from initial vertices for pattern matching (e.g., from node `A` to `B` to `C`). In dynamic graphs, if only newly added vertices/edges trigger computation, results may be incomplete, as illustrated below: -## 方案步骤 +<div style="text-align: center;"> +<img src="https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/23857192/1741576149930-b169b7da-0600-4fca-b6ad-5eadcfdbff5b.jpeg" alt='画板' height="281" width="486"> +</div> -整体流程示例图如下: +The key issue is that **Vertex A1 cannot trigger computation if only the delta is considered**, yet it belongs to the incremental results. To resolve this, we propose a subgraph expansion method from new vertices. The query is divided into two phases: +1. **Evolve Phase**: Propagate `EvolveMessage` from new vertices to neighbors, adding recipients to the `EvolveVertices` set. +2. **Traversal Phase**: Use `EvolveVertices` as traversal triggers for the current window. +## Solution Workflow +Overall process:  -1. 首先得到 query 的计划的迭代次数 N,需向外扩充 N-1 度(<font style="color:#000000;">maxEvolveIteration=N-1)</font>,即可覆盖当前 query。框架的最大迭代数将设置为 N + maxEvolveIteration(N>2) +**Steps:** +1. Determine the query’s iteration count `N`. Expand `N-1` hops outward (`maxEvolveIteration = N-1`) to cover the query. The max iteration becomes `N + maxEvolveIteration` (when `N>2`). ```sql 例如 Review Comment: for example -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
