Kryst4lDem0ni4s commented on issue #183:
URL:
https://github.com/apache/incubator-hugegraph-ai/issues/183#issuecomment-2692203413
I looked further into what @chiruu12 suggests, about not using off-the-shelf
agentic components that can prevent developers from understanding critical
behaviors, so that over-time the behavior of the service doesnt go out of
control.
Indeed it would be possible to write a custom HG_agentic library that
borrows only the necessary pieces, so hugegraph can maintain control over the
logic and integration details.
How about combining all of our suggestions that integrates LlamaIndex,
Pydantic-AI, CrewFlow, and Agno into a dual-mode, modular GraphRAG system while
avoiding the dependency hell as warned by @Aryankb
As a hybrid GraphRAG system that supports two modes we can include:
• A beginner-friendly “agentic retriever” that is pre-fine-tuned with robust
LLM prompting for straightforward use cases, and
• A customizable mode for advanced developers who need to tailor retrieval,
orchestration, and validation mechanisms.
Key design principles so that everyone can get a good night's sleep:
• Modularity & Microservices: standalone services with clearly defined APIs.
• Dual-Mode Operation: ease of use and deep customization.
• Transparent Integration: extracting core functionalities and integrating
them in-house.
• Extensive Logging & Monitoring: via Prometheus.
• Containerization: isolate dependencies.
Architectural Layers & Components:
A. Base Layer – Agno for L1 Queries
Handle high-frequency, low-latency queries (e.g., simple entity lookups)
with optimized parallel execution. Beyond this point we must also seek the
correct strategy to handle LN queries as well.
Key Features:
• Fast execution with low memory footprint.
• Built-in Gremlin-Cypher transpiler for hybrid query support.
• Integration with a hybrid caching layer that combines Agno shared memory
and RocksDB.
• Wrap Agno’s core query engine in a microservice that exposes an HTTP
endpoint.
• Queries can be configured to pass through a lightweight pre-processing
step to select between cache and live query execution (L1) specific.
This component when abstracted into our own agentic library will be the base
of all performance optimizations.
B. Orchestration Layer – CrewAI for Complex Workflows
This would help us manage multi-hop, dynamic queries and agent workflows
that require intent classification and asynchronous execution and allows
customization.
Key Features:
• Dynamic intent classification powered by domain-specific embeddings
(integrated with HugeGraph).
• Event-driven workflow, where subtasks are dynamically generated from a
user’s plain-English prompt.
• Built-in support for sequencing (sequential/parallel) and conditional
delegation of agent tasks.
• Adapt core functionalities from CrewAI (CrewFlow) to create a custom
orchestration module.
• Define a clear API contract for submitting workflows, retrieving status,
and handling error/fallback logic.
C. Validation Layer – Pydantic
All general schema consistency and data integrity across all operations.
THIS distinction is necessary to understand that it's sole purpose here should
be for schema purposes only, not beyond it so far.
Key Features:
• Middleware to validate incoming queries and agent responses.
• Dev-friendly type hints and error reporting.
• Mechanisms to ensure that changes in one layer do not break API contracts.
• Wrap core endpoints of other layers with Pydantic models that perform
input/output validation.
• Integrate validation middleware as a separate microservice or as
decorators within the existing service codebase.
Note: This is the general usage of Pydantic, not it's agentic tools.
Otherwise it is too unpredictable and unsuitable for production.
D. Retrieval Enhancement Layer – LlamaIndex
Finally, in order to provide recursive, multi-hop retrieval functionality
enhanced by tiered caching, ensuring that complex graph queries are answered
effectively, LlamaIndex is compatible with CrewAI alreayd, so we'll look
further into how it's compatibility has been provided.
Key Features:
• Recursive retrieval strategies that work well with hierarchical graph
caching.
• Integration with HugeGraph’s OLAP engine for analytical queries.
• Modular “runnables” inspired by LangChain that allow flexible composition
of retrieval steps.
• Expose LlamaIndex’s retrieval engine via an API that accepts complex,
multi-hop query parameters.
• Use a caching strategy that combines in-memory (for fast lookups) and
persistent (RocksDB) storage to accelerate repeated queries.
Summary for the plan with general key points and implementation steps:
- RESTful API endpoints for query submission, workflow orchestration,
validation, and retrieval.
- A Python SDK (e.g., HG_agentic and HG_orchestrator) that abstracts away
the internal microservices and provides simple functions for (examples):
Creating agents via plain-English commands.
Configuring custom workflows (sequential, parallel, conditional).
Integrating with existing agent systems (AutoGen).
- Define API endpoints for each core service. For example:
> /query/l1 for Agno-based L1 queries.
> /workflow/submit for submitting orchestration tasks.
> /validate for schema checks.
> /retrieve for multi-hop retrieval.
- The Python SDK wraps these endpoints and provides high-level functions,
error handling, and logging.
- Pre-fine-tuned with robust LLMs using few-shot or one-shot prompting.
- Offers a simplified interface where users only need to provide a natural
language prompt.
- Customizable pipeline where developers can modify key components (LLM
selection, prompt configuration, integration with vector databases like
Pinecone, FAISS, Qdrant).
- Leverage the modular “runnables” design inspired by LangChain to allow
easy insertion or replacement of retrieval steps.
- Minimize latency by combining HugeGraph’s native caching (e.g., via
RocksDB) with Agno’s shared memory features.
- Develop a caching microservice that first checks an in-memory cache and
then falls back to RocksDB.
- Ensure that cached results are seamlessly used across L1 and multi-hop
retrieval layers.
- Package each architectural layer as its own Docker container.
- Use orchestration tools e.g., Kubernetes
- Define strict API contracts between services.
- Integrate Prometheus (or a similar tool) into each microservice to collect
metrics
graph
A[User Query/Input] --> B{HTTP API Gateway}
B --> C[Agno L1 Query Service]
B --> D[CrewFlow Orchestrator]
D --> E[Dynamic Agent Creation]
E --> F[Workflow Execution]
F --> G[Pydantic Validation Middleware]
D --> H[Retrieve Request]
H --> I[LlamaIndex Recursive Retriever]
I --> J[Hybrid Caching Layer (RocksDB + Shared Memory)]
G & J --> K[Result Aggregator]
K --> L[HTTP API Gateway -> Response]
What are your thoughts on this approach @imbajin ? Further I'd also like
your thoughts about what I mentioned regarding LN queries and how we'd go about
to handle them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]