[ 
https://issues.apache.org/jira/browse/GSOC-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imba Jin updated GSOC-283:
--------------------------
    Description: 
+Apache+ {+}[HugeGraph|https://hugegraph.apache.org/]{+}{+}(incubating){+} is a 
fast-speed and highly-scalable {+}[graph 
database|https://en.wikipedia.org/wiki/Graph_database]{+}{+}/{+}computing/AI 
ecosystem. Billions of vertices and edges can be easily stored into and queried 
from HugeGraph due to its excellent OLTP/OLAP ability.
 
Website: [https://hugegraph.apache.org/]
GitHub:
 - [https://github.com/apache/incubator-hugegraph/]
 - 
[https://github.com/apache/incubator-hugegraph-ai/|https://github.com/apache/incubator-hugegraph/]
 
h2. Description

Currently, we have implemented a basic GraphRAG that relies on fixed processing 
workflows (e.g., knowledge retrieval & graph structure updates using the same 
execution pipeline), leading to insufficient flexibility and high overhead in 
complex scenarios. The proposed task introduces an Agentic architecture based 
on the principles of "dynamic awareness, lightweight scheduling, concurrent 
execution," focusing on solving the following issues:
 # {*}Rigid Intent Recognition{*}: Existing systems cannot effectively 
distinguish between simple retrievals (e.g., entity queries) and complex 
operations (e.g., multi-hop reasoning), often defaulting to BFS-based template 
subgraph searches.
 # {*}Coupled Execution Resources{*}: Memory/computational resources are not 
isolated based on task characteristics, causing long-tail tasks to block 
high-priority requests.
 # {*}Lack of Feedback Mechanisms{*}: Absence of self-correction capabilities 
for erroneous operations (e.g., automatically switching to similar 
vertices/entities after path retrieval failures).

The task will include three core parts:

*1. Dynamic Awareness Layer*
 * Implement an LLM-based real-time (as of February 14, 2025) intent classifier 
that categorizes tasks (L1 simple retrieval/L2 path reasoning/L3 graph 
computation/L4+ etc.) based on semantic features (verb types/entity 
complexity/temporal modifiers).
 * Build a lightweight operation cache to generate feature hashes for 
high-frequency requests, enabling millisecond-level intent matching.

*2. Task Orchestration Layer*
 * Introduce a suitable workflow/taskflow framework emphasizing low coupling, 
high performance, and flexibility.
 * Adopt a preemptive scheduling mechanism allowing high-priority tasks to 
pause non-critical phases of low-priority tasks (e.g., suspending subgraph 
preloading without interrupting core computations).

*3. Concurrent Execution*
 * Decouple traditional RAG pipelines into composable operations (entity recall 
→ path validation → context enhancement → result refinement), with dynamic 
enable/disable support for each component.
 * Implement automatic execution engine degradation, triggering fallback 
strategies upon sub-operation failures (e.g., switching to alternative methods 
if Gremlin queries timeout).

h2. *Recommended Skills*
 # Proficiency in Python and familiarity with at least one open/closed-source 
LLM.
 # Experience with one LLM RAG/Agent framework like 
LangGraph/RAGflow/LLamaindex/Dify.
 # Knowledge of LLM optimization techniques and RAG construction (KG 
extraction/construction experience is a plus).
 # Strong algorithmic engineering skills (problem abstraction, algorithm 
research, big data processing, model tuning).
 # Familiarity with VectorDB/Graph/KG/HugeGraph read-write workflows and 
principles.
 # Understanding of graph algorithms (e.g., community detection, centrality, 
PageRank) and open-source community experience preferred.

*Task List*
 * Develop a hierarchical triggering mechanism for the intent classifier to 
categorize L1~LN tasks within milliseconds (accuracy >90%).

 * Semi-automatically generate Graph Schema/extraction prompts.

 * Support dynamic routing and query decomposition.

 * Design an execution trace tracker to log micro-operation resource 
consumption and generate optimization reports.

 * Enhance retrieval with graph algorithms: Apply node importance evaluation, 
path search, etc., to optimize knowledge recall.

 * Implement a dialogue memory management module for context-aware state 
tracking and information reuse.

h3. Size
 * Difficulty: Hard

 * Project size: ~350 hours (full-time/large)

h2. Potential Mentors
 * Imba Jin: [j...@apache.org|mailto:j...@apache.org] (Apache HugeGraph PPMC)
 * Simon: [m...@apache.org|mailto:m...@apache.org] (Apache HugeGraph PPMC)
 

  was:
+Apache+ {+}[HugeGraph|https://hugegraph.apache.org/]{+}{+}(incubating){+} is a 
fast-speed and highly-scalable {+}[graph 
database|https://en.wikipedia.org/wiki/Graph_database]{+}{+}/{+}computing/AI 
ecosystem. Billions of vertices and edges can be easily stored into and queried 
from HugeGraph due to its excellent OLTP/OLAP ability.
 
Website: [https://hugegraph.apache.org/]
GitHub:
 - [https://github.com/apache/incubator-hugegraph/]
 - 
[https://github.com/apache/incubator-hugegraph-ai/|https://github.com/apache/incubator-hugegraph/]
 
h2. Description

Currently, we have implemented a basic GraphRAG that relies on fixed processing 
workflows (e.g., knowledge retrieval & graph structure updates using the same 
execution pipeline), leading to insufficient flexibility and high overhead in 
complex scenarios. The proposed task introduces an Agentic architecture based 
on the principles of "dynamic awareness, lightweight scheduling, concurrent 
execution," focusing on solving the following issues:
 # {*}Rigid Intent Recognition{*}: Existing systems cannot effectively 
distinguish between simple retrievals (e.g., entity queries) and complex 
operations (e.g., multi-hop reasoning), often defaulting to BFS-based template 
subgraph searches.

 # {*}Coupled Execution Resources{*}: Memory/computational resources are not 
isolated based on task characteristics, causing long-tail tasks to block 
high-priority requests.

 # {*}Lack of Feedback Mechanisms{*}: Absence of self-correction capabilities 
for erroneous operations (e.g., automatically switching to similar 
vertices/entities after path retrieval failures).

The task will include three core parts:
 # *Dynamic Awareness Layer*

 * 
 ** Implement an LLM-based real-time (as of February 14, 2025) intent 
classifier that categorizes tasks (L1 simple retrieval/L2 path reasoning/L3 
graph computation/L4+ etc.) based on semantic features (verb types/entity 
complexity/temporal modifiers).

 * 
 ** Build a lightweight operation cache to generate feature hashes for 
high-frequency requests, enabling millisecond-level intent matching.

 # *Task Orchestration Layer*

 * 
 ** Introduce a suitable workflow/taskflow framework emphasizing low coupling, 
high performance, and flexibility.

 * 
 ** Adopt a preemptive scheduling mechanism allowing high-priority tasks to 
pause non-critical phases of low-priority tasks (e.g., suspending subgraph 
preloading without interrupting core computations).

 # *Concurrent Execution*

 * 
 ** Decouple traditional RAG pipelines into composable operations (entity 
recall → path validation → context enhancement → result refinement), with 
dynamic enable/disable support for each component.

 * 
 ** Implement automatic execution engine degradation, triggering fallback 
strategies upon sub-operation failures (e.g., switching to alternative methods 
if Gremlin queries timeout).

h2. *Recommended Skills*
 # Proficiency in Python and familiarity with at least one open/closed-source 
LLM.

 # Experience with one LLM RAG/Agent framework like 
LangGraph/RAGflow/LLamaindex/Dify.

 # Knowledge of LLM optimization techniques and RAG construction (KG 
extraction/construction experience is a plus).

 # Strong algorithmic engineering skills (problem abstraction, algorithm 
research, big data processing, model tuning).

 # Familiarity with VectorDB/Graph/KG/HugeGraph read-write workflows and 
principles.

 # Understanding of graph algorithms (e.g., community detection, centrality, 
PageRank) and open-source community experience preferred.

*Task List*
 * Develop a hierarchical triggering mechanism for the intent classifier to 
categorize L1~LN tasks within milliseconds (accuracy >90%).

 * Semi-automatically generate Graph Schema/extraction prompts.

 * Support dynamic routing and query decomposition.

 * Design an execution trace tracker to log micro-operation resource 
consumption and generate optimization reports.

 * Enhance retrieval with graph algorithms: Apply node importance evaluation, 
path search, etc., to optimize knowledge recall.

 * Implement a dialogue memory management module for context-aware state 
tracking and information reuse.

h3. Size
 * Difficulty: Hard

 * Project size: ~350 hours (full-time/large)

h2. Potential Mentors
 * Imba Jin: [j...@apache.org|mailto:j...@apache.org] (Apache HugeGraph PPMC)
 * Simon: [m...@apache.org|mailto:m...@apache.org] (Apache HugeGraph PPMC)
 


> [GSoC][HugeGraph] Implement Agentic GraphRAG Architecture
> ---------------------------------------------------------
>
>                 Key: GSOC-283
>                 URL: https://issues.apache.org/jira/browse/GSOC-283
>             Project: Comdev GSOC
>          Issue Type: Task
>            Reporter: Imba Jin
>            Priority: Major
>              Labels: RAG, agent, graph, gsoc2025
>
> +Apache+ {+}[HugeGraph|https://hugegraph.apache.org/]{+}{+}(incubating){+} is 
> a fast-speed and highly-scalable {+}[graph 
> database|https://en.wikipedia.org/wiki/Graph_database]{+}{+}/{+}computing/AI 
> ecosystem. Billions of vertices and edges can be easily stored into and 
> queried from HugeGraph due to its excellent OLTP/OLAP ability.
>  
> Website: [https://hugegraph.apache.org/]
> GitHub:
>  - [https://github.com/apache/incubator-hugegraph/]
>  - 
> [https://github.com/apache/incubator-hugegraph-ai/|https://github.com/apache/incubator-hugegraph/]
>  
> h2. Description
> Currently, we have implemented a basic GraphRAG that relies on fixed 
> processing workflows (e.g., knowledge retrieval & graph structure updates 
> using the same execution pipeline), leading to insufficient flexibility and 
> high overhead in complex scenarios. The proposed task introduces an Agentic 
> architecture based on the principles of "dynamic awareness, lightweight 
> scheduling, concurrent execution," focusing on solving the following issues:
>  # {*}Rigid Intent Recognition{*}: Existing systems cannot effectively 
> distinguish between simple retrievals (e.g., entity queries) and complex 
> operations (e.g., multi-hop reasoning), often defaulting to BFS-based 
> template subgraph searches.
>  # {*}Coupled Execution Resources{*}: Memory/computational resources are not 
> isolated based on task characteristics, causing long-tail tasks to block 
> high-priority requests.
>  # {*}Lack of Feedback Mechanisms{*}: Absence of self-correction capabilities 
> for erroneous operations (e.g., automatically switching to similar 
> vertices/entities after path retrieval failures).
> The task will include three core parts:
> *1. Dynamic Awareness Layer*
>  * Implement an LLM-based real-time (as of February 14, 2025) intent 
> classifier that categorizes tasks (L1 simple retrieval/L2 path reasoning/L3 
> graph computation/L4+ etc.) based on semantic features (verb types/entity 
> complexity/temporal modifiers).
>  * Build a lightweight operation cache to generate feature hashes for 
> high-frequency requests, enabling millisecond-level intent matching.
> *2. Task Orchestration Layer*
>  * Introduce a suitable workflow/taskflow framework emphasizing low coupling, 
> high performance, and flexibility.
>  * Adopt a preemptive scheduling mechanism allowing high-priority tasks to 
> pause non-critical phases of low-priority tasks (e.g., suspending subgraph 
> preloading without interrupting core computations).
> *3. Concurrent Execution*
>  * Decouple traditional RAG pipelines into composable operations (entity 
> recall → path validation → context enhancement → result refinement), with 
> dynamic enable/disable support for each component.
>  * Implement automatic execution engine degradation, triggering fallback 
> strategies upon sub-operation failures (e.g., switching to alternative 
> methods if Gremlin queries timeout).
> h2. *Recommended Skills*
>  # Proficiency in Python and familiarity with at least one open/closed-source 
> LLM.
>  # Experience with one LLM RAG/Agent framework like 
> LangGraph/RAGflow/LLamaindex/Dify.
>  # Knowledge of LLM optimization techniques and RAG construction (KG 
> extraction/construction experience is a plus).
>  # Strong algorithmic engineering skills (problem abstraction, algorithm 
> research, big data processing, model tuning).
>  # Familiarity with VectorDB/Graph/KG/HugeGraph read-write workflows and 
> principles.
>  # Understanding of graph algorithms (e.g., community detection, centrality, 
> PageRank) and open-source community experience preferred.
> *Task List*
>  * Develop a hierarchical triggering mechanism for the intent classifier to 
> categorize L1~LN tasks within milliseconds (accuracy >90%).
>  * Semi-automatically generate Graph Schema/extraction prompts.
>  * Support dynamic routing and query decomposition.
>  * Design an execution trace tracker to log micro-operation resource 
> consumption and generate optimization reports.
>  * Enhance retrieval with graph algorithms: Apply node importance evaluation, 
> path search, etc., to optimize knowledge recall.
>  * Implement a dialogue memory management module for context-aware state 
> tracking and information reuse.
> h3. Size
>  * Difficulty: Hard
>  * Project size: ~350 hours (full-time/large)
> h2. Potential Mentors
>  * Imba Jin: [j...@apache.org|mailto:j...@apache.org] (Apache HugeGraph PPMC)
>  * Simon: [m...@apache.org|mailto:m...@apache.org] (Apache HugeGraph PPMC)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: gsoc-unsubscr...@community.apache.org
For additional commands, e-mail: gsoc-h...@community.apache.org

Reply via email to