Kryst4lDem0ni4s commented on issue #183:
URL: 
https://github.com/apache/incubator-hugegraph-ai/issues/183#issuecomment-2692203413

   I looked further into what @chiruu12 suggests, about not using off-the-shelf 
agentic components that can prevent developers from understanding critical 
behaviors, so that over-time the behavior of the service doesnt go out of 
control.
   Indeed it would be possible to write a custom HG_agentic library that 
borrows only the necessary pieces, so hugegraph can maintain control over the 
logic and integration details.
   
   How about combining all of our suggestions that integrates LlamaIndex, 
Pydantic-AI, CrewFlow, and Agno into a dual-mode, modular GraphRAG system while 
avoiding the dependency hell as warned by @Aryankb 
   
   As a hybrid GraphRAG system that supports two modes we can include:
   • A beginner-friendly “agentic retriever” that is pre-fine-tuned with robust 
LLM prompting for straightforward use cases, and
   • A customizable mode for advanced developers who need to tailor retrieval, 
orchestration, and validation mechanisms.
   
   Key design principles so that everyone can get a good night's sleep:
   • Modularity & Microservices: standalone services with clearly defined APIs.
   • Dual-Mode Operation: ease of use and deep customization.
   • Transparent Integration: extracting core functionalities and integrating 
them in-house.
   • Extensive Logging & Monitoring: via Prometheus.
   • Containerization: isolate dependencies.
   
   Architectural Layers & Components:
   A. Base Layer – Agno for L1 Queries
   Handle high-frequency, low-latency queries (e.g., simple entity lookups) 
with optimized parallel execution. Beyond this point we must also seek the 
correct strategy to handle LN queries as well.
   
   Key Features:
   • Fast execution with low memory footprint.
   • Built-in Gremlin-Cypher transpiler for hybrid query support.
   • Integration with a hybrid caching layer that combines Agno shared memory 
and RocksDB.
   • Wrap Agno’s core query engine in a microservice that exposes an HTTP 
endpoint.
   • Queries can be configured to pass through a lightweight pre-processing 
step to select between cache and live query execution (L1) specific.
   
   This component when abstracted into our own agentic library will be the base 
of all performance optimizations.
   
   B. Orchestration Layer – CrewAI for Complex Workflows
   This would help us manage multi-hop, dynamic queries and agent workflows 
that require intent classification and asynchronous execution and allows 
customization.
   
   Key Features:
   • Dynamic intent classification powered by domain-specific embeddings 
(integrated with HugeGraph).
   • Event-driven workflow, where subtasks are dynamically generated from a 
user’s plain-English prompt.
   • Built-in support for sequencing (sequential/parallel) and conditional 
delegation of agent tasks.
   • Adapt core functionalities from CrewAI (CrewFlow) to create a custom 
orchestration module.
   • Define a clear API contract for submitting workflows, retrieving status, 
and handling error/fallback logic.
   
   C. Validation Layer – Pydantic
   All general schema consistency and data integrity across all operations. 
THIS distinction is necessary to understand that it's sole purpose here should 
be for schema purposes only, not beyond it so far.
   
   Key Features:
   • Middleware to validate incoming queries and agent responses.
   • Dev-friendly type hints and error reporting.
   • Mechanisms to ensure that changes in one layer do not break API contracts.
   • Wrap core endpoints of other layers with Pydantic models that perform 
input/output validation.
   • Integrate validation middleware as a separate microservice or as 
decorators within the existing service codebase.
   
   Note: This is the general usage of Pydantic, not it's agentic tools. 
Otherwise it is too unpredictable and unsuitable for production.
   
   D. Retrieval Enhancement Layer – LlamaIndex
   Finally, in order to provide recursive, multi-hop retrieval functionality 
enhanced by tiered caching, ensuring that complex graph queries are answered 
effectively, LlamaIndex is compatible with CrewAI alreayd, so we'll look 
further into how it's compatibility has been provided.
   Key Features:
   • Recursive retrieval strategies that work well with hierarchical graph 
caching.
   • Integration with HugeGraph’s OLAP engine for analytical queries.
   • Modular “runnables” inspired by LangChain that allow flexible composition 
of retrieval steps.
   • Expose LlamaIndex’s retrieval engine via an API that accepts complex, 
multi-hop query parameters.
   • Use a caching strategy that combines in-memory (for fast lookups) and 
persistent (RocksDB) storage to accelerate repeated queries.
   
   
   Summary for the plan with general key points and implementation steps:
   - RESTful API endpoints for query submission, workflow orchestration, 
validation, and retrieval.
   - A Python SDK (e.g., HG_agentic and HG_orchestrator) that abstracts away 
the internal microservices and provides simple functions for (examples):
   Creating agents via plain-English commands.
   Configuring custom workflows (sequential, parallel, conditional).
   Integrating with existing agent systems (AutoGen).
   - Define API endpoints for each core service. For example:
   > /query/l1 for Agno-based L1 queries.
   > /workflow/submit for submitting orchestration tasks.
   > /validate for schema checks.
   > /retrieve for multi-hop retrieval. 
   - The Python SDK wraps these endpoints and provides high-level functions, 
error handling, and logging.
   - Pre-fine-tuned with robust LLMs using few-shot or one-shot prompting.
   - Offers a simplified interface where users only need to provide a natural 
language prompt.
   - Customizable pipeline where developers can modify key components (LLM 
selection, prompt configuration, integration with vector databases like 
Pinecone, FAISS, Qdrant).
   - Leverage the modular “runnables” design inspired by LangChain to allow 
easy insertion or replacement of retrieval steps.
   - Minimize latency by combining HugeGraph’s native caching (e.g., via 
RocksDB) with Agno’s shared memory features.
   - Develop a caching microservice that first checks an in-memory cache and 
then falls back to RocksDB.
   - Ensure that cached results are seamlessly used across L1 and multi-hop 
retrieval layers.
   - Package each architectural layer as its own Docker container.
   - Use orchestration tools e.g., Kubernetes
   - Define strict API contracts between services.
   - Integrate Prometheus (or a similar tool) into each microservice to collect 
metrics
   
   
   graph 
     A[User Query/Input] --> B{HTTP API Gateway}
     B --> C[Agno L1 Query Service]
     B --> D[CrewFlow Orchestrator]
     D --> E[Dynamic Agent Creation]
     E --> F[Workflow Execution]
     F --> G[Pydantic Validation Middleware]
     D --> H[Retrieve Request]
     H --> I[LlamaIndex Recursive Retriever]
     I --> J[Hybrid Caching Layer (RocksDB + Shared Memory)]
     G & J --> K[Result Aggregator]
     K --> L[HTTP API Gateway -> Response]
   
   What are your thoughts on this approach @imbajin ? Further I'd also like 
your thoughts about what I mentioned regarding LN queries and how we'd go about 
to handle them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to