This is an automated email from the ASF dual-hosted git repository. jin pushed a commit to branch codebase in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-ai.git
commit 5ef2fa99f41c6f44a82b0c301dc6a65651075a9d Author: imbajin <j...@apache.org> AuthorDate: Thu Jul 24 21:15:48 2025 +0800 chore(llm): add a basic LLM/AI coding instruction file --- hugegraph-llm/basic-introduction.md | 111 ++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/hugegraph-llm/basic-introduction.md b/hugegraph-llm/basic-introduction.md new file mode 100644 index 00000000..b8f229f6 --- /dev/null +++ b/hugegraph-llm/basic-introduction.md @@ -0,0 +1,111 @@ +# Basic Introduction + +This file provides guidance to AI coding tools and developers when working with code in this repository. + +## Project Overview + +HugeGraph-LLM is a comprehensive toolkit that bridges graph databases and large language models, part of the Apache HugeGraph AI ecosystem. It enables seamless integration between HugeGraph and LLMs for building intelligent applications with three main capabilities: Knowledge Graph Construction, Graph-Enhanced RAG, and Text2Gremlin query generation. + +## Tech Stack + +- **Language**: Python 3.10+ (UV package manager required) +- **Framework**: FastAPI + Gradio for web interfaces +- **Graph Database**: HugeGraph Server 1.3+ (recommended: 1.5+) +- **LLM Integration**: LiteLLM (supports OpenAI, Ollama, Qianfan, etc.) +- **Vector Operations**: FAISS, NumPy, Pandas +- **Key Dependencies**: hugegraph-python-client, LangChain, NLTK, Jieba + +## Essential Commands + +### Setup and Development +```bash +# Install UV package manager (if not installed) +curl -LsSf https://astral.sh/uv/install.sh | sh + +# Install dependencies and create virtual environment +uv sync + +# Activate virtual environment +source .venv/bin/activate + +# Generate configuration files +python -m hugegraph_llm.config.generate --update + +# Download NLTK data (required for text processing) +python ./src/hugegraph_llm/operators/common_op/nltk_helper.py +``` + +### Running the Application +```bash +# Launch main RAG demo application +python -m hugegraph_llm.demo.rag_demo.app + +# Custom host/port +python -m hugegraph_llm.demo.rag_demo.app --host 127.0.0.1 --port 18001 +``` + +### Testing +```bash +# Run tests (located in src/tests/) +python -m unittest discover src/tests/ +``` + +### Docker Deployment +```bash +# Full stack deployment +cd docker +cp env.template .env # Edit PROJECT_PATH +docker-compose -f docker-compose-network.yml up -d +``` + +## Architecture Overview + +### Core Directory Structure +- `src/hugegraph_llm/api/` - FastAPI endpoints (rag_api.py, admin_api.py) +- `src/hugegraph_llm/demo/rag_demo/` - Main Gradio UI application +- `src/hugegraph_llm/operators/` - Core processing pipelines +- `src/hugegraph_llm/models/` - LLM, embedding, reranker implementations +- `src/hugegraph_llm/indices/` - Vector and graph indexing +- `src/hugegraph_llm/config/` - Configuration management +- `src/hugegraph_llm/utils/` - Utilities, logging, decorators + +### Key Processing Pipelines + +1. **KG Construction** (`operators/kg_construction_task.py`) + - Text chunking and vectorization pipeline + - Schema management and validation + - Information extraction using LLMs + - Graph data commitment to HugeGraph + +2. **Graph RAG** (`operators/graph_rag_task.py`) + - Multi-modal retrieval (vector, graph, hybrid) + - Keyword extraction and entity matching + - Graph traversal and Gremlin query generation + - Result merging and reranking + +3. **Text2Gremlin** (`operators/gremlin_generate_task.py`) + - Natural language to Gremlin query conversion + - Template-based and few-shot learning approaches + +### Configuration Management + +- Main config: `.env` file (generate with config.generate module) +- Prompt config: `src/hugegraph_llm/resources/demo/config_prompt.yaml` +- HugeGraph connection settings in environment variables +- LLM provider configuration through LiteLLM + +## Development Workflow + +1. **Prerequisites**: Ensure HugeGraph Server is running and LLM provider is configured +2. **Environment Setup**: Use UV for dependency management, activate virtual environment +3. **Configuration**: Generate configs and set up .env file with proper credentials +4. **Development**: Use Gradio demo for interactive testing, FastAPI for programmatic access +5. **Testing**: Unit tests use standard unittest framework in src/tests/ + +## Important Notes + +- Always use UV package manager instead of pip for dependency management +- HugeGraph Server must be accessible before running the application +- NLTK data download is required for text processing operations +- Configuration generation must be run after any major environment changes +- The system supports multiple LLM providers through LiteLLM abstraction \ No newline at end of file