This is an automated email from the ASF dual-hosted git repository. jin pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-ai.git
The following commit(s) were added to refs/heads/main by this push: new 99cd5e40 feat(llm): support switch prompt language in EN/CN (#269) 99cd5e40 is described below commit 99cd5e406c9796384d4056b760bf847c857e9969 Author: day0n <niu...@gmail.com> AuthorDate: Wed Aug 6 18:10:31 2025 +0800 feat(llm): support switch prompt language in EN/CN (#269) close #260 - Add prompt language indicator <img width="1878" height="89" alt="image" src="https://github.com/user-attachments/assets/a0040611-8c5d-45c5-8aa7-580fd4e5367d" /> <img width="1853" height="52" alt="image" src="https://github.com/user-attachments/assets/49201c7b-e6e9-403c-bc76-7e2b6e03ac8d" /> - Add query examples for CN/EN support <img width="1233" height="407" alt="image" src="https://github.com/user-attachments/assets/4c06cd78-a6c3-4749-b132-a9db42a29429" /> -update README with prompt language support details -Support switching prompt EN/CN support switch prompt language EN/CN <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **新功能** * 增加了对中英文提示词的支持,可通过 `.env` 文件中的 `LANGUAGE` 配置切换语言,默认使用英文。 * 配置文件中新增 `language` 字段,支持 "EN" 和 "CN" 两种语言选项。 * 自动检测并同步 `.env` 配置与 YAML 提示词文件的语言设置,确保提示词内容与当前语言一致,语言切换时自动重新生成提示词。 * 提示词属性重命名以区分语言版本,新增语言参数支持提示词生成。 * 演示界面新增语言指示器,动态显示当前使用的提示词语言。 * 查询示例根据语言自动加载对应的中英文版本。 * 新增中文查询示例资源文件,丰富多样的图数据库查询案例。 * **文档** * README 增加了“语言支持”说明,指导如何切换中英文提示词。 * 新增详细的配置说明文档 `CONFIGURATION.md`,涵盖系统、提示词、部署和项目配置,提供完整配置参数说明和使用示例。 * **样式** * 优化了 README 部分格式,提升可读性。 * pylint 配置调整,改善对动态属性和兼容性警告的处理。 * 新增语言指示器相关样式,支持界面语言状态的视觉展示。 <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: imbajin <j...@apache.org> --- hugegraph-llm/README.md | 17 +- hugegraph-llm/config.md | 280 ++++++++++++ hugegraph-llm/src/hugegraph_llm/config/__init__.py | 4 +- hugegraph-llm/src/hugegraph_llm/config/generate.py | 2 +- .../src/hugegraph_llm/config/llm_config.py | 2 +- .../config/models/base_prompt_config.py | 134 +++--- .../src/hugegraph_llm/config/prompt_config.py | 15 +- .../src/hugegraph_llm/demo/rag_demo/app.py | 35 +- .../hugegraph_llm/demo/rag_demo/configs_block.py | 473 ++++++++++++++++----- .../demo/rag_demo/vector_graph_block.py | 209 ++++++--- .../operators/llm_op/prompt_generate.py | 3 +- .../src/hugegraph_llm/resources/demo/css.py | 109 +++++ .../prompt_examples/query_examples_CN.json | 9 + style/pylint.conf | 3 +- 14 files changed, 1041 insertions(+), 254 deletions(-) diff --git a/hugegraph-llm/README.md b/hugegraph-llm/README.md index 9fa57603..63642557 100644 --- a/hugegraph-llm/README.md +++ b/hugegraph-llm/README.md @@ -32,6 +32,7 @@ The fastest way to get started with both HugeGraph Server and RAG Service: # 1. Set up environment cp docker/env.template docker/.env # Edit docker/.env and set PROJECT_PATH to your actual project path +# See "config.md" for all available configuration options # If there is not a configuration file (named .env) under hugegraph-llm, run the following command cd hugegraph-llm && touch .env && cd .. @@ -87,6 +88,8 @@ curl -LsSf https://astral.sh/uv/install.sh | sh git clone https://github.com/apache/incubator-hugegraph-ai.git cd incubator-hugegraph-ai/hugegraph-llm +# Configure environment (see config.md for detailed options), .env will auto create if not exists + # 4. Install dependencies and activate environment # NOTE: If download is slow, uncomment mirror lines in ../pyproject.toml or use: uv config --global index.url https://pypi.tuna.tsinghua.edu.cn/simple # Or create local uv.toml with mirror settings to avoid git diff (see uv.toml example in root) @@ -219,6 +222,16 @@ After running the demo, configuration files are automatically generated: - **Environment**: `hugegraph-llm/.env` - **Prompts**: `hugegraph-llm/src/hugegraph_llm/resources/demo/config_prompt.yaml` +### Language Support + +The system supports both English and Chinese prompts. To switch languages: + +1. **Edit `.env` file**: Change `LANGUAGE=en` to `LANGUAGE=CN` (or vice versa) +2. **Restart the application**: The system will automatically regenerate prompts in the selected language + +**Supported Values:** +- `LANGUAGE=EN` - English prompts (default) +- `LANGUAGE=CN` - Chinese prompts (中文提示词) > [!NOTE] > Configuration changes are automatically saved when using the web interface. > For manual changes, simply refresh the page to load updates. @@ -229,14 +242,14 @@ After running the demo, configuration files are automatically generated: > [!IMPORTANT] > **For developers contributing to hugegraph-llm with AI coding assistance:** -> +> > - **Start Here**: First read `../rules/README.md` for the complete > AI-assisted development workflow > - **Module Context**: Rename `basic-introduction.md` in this directory as > context for your LLM (e.g., `CLAUDE.md`, `copilot-instructions.md`) > - **Code Analysis**: Follow comprehensive analysis methodology in > `../rules/prompts/project-deep.md` > - **Documentation**: Maintain structured documentation standards from > `../rules/prompts/project-general.md` > - **Quality Standards**: Ensure type annotations, proper testing, and > consistent patterns > - **Business Logic**: Focus on graph-LLM integration logic and RAG pipeline > optimization -> +> > These guidelines ensure consistent code quality and maintainable graph-AI > integrations. ## 📚 Additional Resources diff --git a/hugegraph-llm/config.md b/hugegraph-llm/config.md new file mode 100644 index 00000000..a55172f3 --- /dev/null +++ b/hugegraph-llm/config.md @@ -0,0 +1,280 @@ +# HugeGraph LLM 配置选项 (详解) + +本文档详细说明了 HugeGraph LLM 项目中所有的配置选项。配置分为以下几类: + +1. **基础配置**:通过 `.env` 文件管理 +2. **Prompt 配置**:通过 `config_prompt.yaml` 文件管理 +3. **Docker 配置**:通过 Docker 和 Helm 配置文件管理 +4. **项目配置**:通过 `pyproject.toml` 和 `JSON` 文件管理 + +## 目录 + +- [.env 配置文件](#env-配置文件) + - [基础配置](#基础配置) + - [OpenAI 配置](#openai-配置) + - [Ollama 配置](#ollama-配置) + - [LiteLLM 配置](#litellm-配置) + - [重排序配置](#重排序配置) + - [HugeGraph 数据库配置](#hugegraph-数据库配置) + - [管理员配置](#管理员配置) +- [配置使用示例](#配置使用示例) +- [配置文件位置](#配置文件位置) + +## .env 配置文件 + +`.env` 文件位于 `hugegraph-llm/` 目录下,包含了系统运行所需的所有配置项。 + +### 基础配置 + +| 配置项 | 类型 | 默认值 | 说明 | +|---------------------|--------------------------------------------------------|--------|---------------------------------------| +| `LANGUAGE` | Literal["EN", "CN"] | EN | prompt语言,支持 EN(英文)和 CN(中文) | +| `CHAT_LLM_TYPE` | Literal["openai", "litellm", "ollama/local"] | openai | 聊天 LLM 类型:openai/litellm/ollama/local | +| `EXTRACT_LLM_TYPE` | Literal["openai", "litellm", "ollama/local"] | openai | 信息提取 LLM 类型 | +| `TEXT2GQL_LLM_TYPE` | Literal["openai", "litellm", "ollama/local"] | openai | 文本转 GQL LLM 类型 | +| `EMBEDDING_TYPE` | Optional[Literal["openai", "litellm", "ollama/local"]] | openai | 嵌入模型类型 | +| `RERANKER_TYPE` | Optional[Literal["cohere", "siliconflow"]] | None | 重排序模型类型:cohere/siliconflow | + +### OpenAI 配置 + +| 配置项 | 类型 | 默认值 | 说明 | +|----------------------------------|------------------|---------------------------|---------------------------| +| `OPENAI_CHAT_API_BASE` | Optional[String] | https://api.openai.com/v1 | OpenAI 聊天 API 基础 URL | +| `OPENAI_CHAT_API_KEY` | Optional[String] | - | OpenAI 聊天 API 密钥 | +| `OPENAI_CHAT_LANGUAGE_MODEL` | Optional[String] | gpt-4.1-mini | 聊天模型名称 | +| `OPENAI_CHAT_TOKENS` | Integer | 8192 | 聊天最大令牌数 | +| `OPENAI_EXTRACT_API_BASE` | Optional[String] | https://api.openai.com/v1 | OpenAI 提取 API 基础 URL | +| `OPENAI_EXTRACT_API_KEY` | Optional[String] | - | OpenAI 提取 API 密钥 | +| `OPENAI_EXTRACT_LANGUAGE_MODEL` | Optional[String] | gpt-4.1-mini | 提取模型名称 | +| `OPENAI_EXTRACT_TOKENS` | Integer | 256 | 提取最大令牌数 | +| `OPENAI_TEXT2GQL_API_BASE` | Optional[String] | https://api.openai.com/v1 | OpenAI 文本转 GQL API 基础 URL | +| `OPENAI_TEXT2GQL_API_KEY` | Optional[String] | - | OpenAI 文本转 GQL API 密钥 | +| `OPENAI_TEXT2GQL_LANGUAGE_MODEL` | Optional[String] | gpt-4.1-mini | 文本转 GQL 模型名称 | +| `OPENAI_TEXT2GQL_TOKENS` | Integer | 4096 | 文本转 GQL 最大令牌数 | +| `OPENAI_EMBEDDING_API_BASE` | Optional[String] | https://api.openai.com/v1 | OpenAI 嵌入 API 基础 URL | +| `OPENAI_EMBEDDING_API_KEY` | Optional[String] | - | OpenAI 嵌入 API 密钥 | +| `OPENAI_EMBEDDING_MODEL` | Optional[String] | text-embedding-3-small | 嵌入模型名称 | + +#### OpenAI 环境变量 + +| 环境变量 | 对应配置项 | 说明 | +|-----------------------------|---------------------------|-------------------------------| +| `OPENAI_BASE_URL` | 所有 OpenAI API_BASE | 通用 OpenAI API 基础 URL | +| `OPENAI_API_KEY` | 所有 OpenAI API_KEY | 通用 OpenAI API 密钥 | +| `OPENAI_EMBEDDING_BASE_URL` | OPENAI_EMBEDDING_API_BASE | OpenAI 嵌入 API 基础 URL | +| `OPENAI_EMBEDDING_API_KEY` | OPENAI_EMBEDDING_API_KEY | OpenAI 嵌入 API 密钥 | +| `CO_API_URL` | COHERE_BASE_URL | Cohere API URL(环境变量 fallback) | + +### Ollama 配置 + +| 配置项 | 类型 | 默认值 | 说明 | +|----------------------------------|-------------------|-----------|---------------------| +| `OLLAMA_CHAT_HOST` | Optional[String] | 127.0.0.1 | Ollama 聊天服务主机 | +| `OLLAMA_CHAT_PORT` | Optional[Integer] | 11434 | Ollama 聊天服务端口 | +| `OLLAMA_CHAT_LANGUAGE_MODEL` | Optional[String] | - | Ollama 聊天模型名称 | +| `OLLAMA_EXTRACT_HOST` | Optional[String] | 127.0.0.1 | Ollama 提取服务主机 | +| `OLLAMA_EXTRACT_PORT` | Optional[Integer] | 11434 | Ollama 提取服务端口 | +| `OLLAMA_EXTRACT_LANGUAGE_MODEL` | Optional[String] | - | Ollama 提取模型名称 | +| `OLLAMA_TEXT2GQL_HOST` | Optional[String] | 127.0.0.1 | Ollama 文本转 GQL 服务主机 | +| `OLLAMA_TEXT2GQL_PORT` | Optional[Integer] | 11434 | Ollama 文本转 GQL 服务端口 | +| `OLLAMA_TEXT2GQL_LANGUAGE_MODEL` | Optional[String] | - | Ollama 文本转 GQL 模型名称 | +| `OLLAMA_EMBEDDING_HOST` | Optional[String] | 127.0.0.1 | Ollama 嵌入服务主机 | +| `OLLAMA_EMBEDDING_PORT` | Optional[Integer] | 11434 | Ollama 嵌入服务端口 | +| `OLLAMA_EMBEDDING_MODEL` | Optional[String] | - | Ollama 嵌入模型名称 | + +### LiteLLM 配置 + +| 配置项 | 类型 | 默认值 | 说明 | +|-----------------------------------|------------------|-------------------------------|----------------------------| +| `LITELLM_CHAT_API_KEY` | Optional[String] | - | LiteLLM 聊天 API 密钥 | +| `LITELLM_CHAT_API_BASE` | Optional[String] | - | LiteLLM 聊天 API 基础 URL | +| `LITELLM_CHAT_LANGUAGE_MODEL` | Optional[String] | openai/gpt-4.1-mini | LiteLLM 聊天模型名称 | +| `LITELLM_CHAT_TOKENS` | Integer | 8192 | 聊天最大令牌数 | +| `LITELLM_EXTRACT_API_KEY` | Optional[String] | - | LiteLLM 提取 API 密钥 | +| `LITELLM_EXTRACT_API_BASE` | Optional[String] | - | LiteLLM 提取 API 基础 URL | +| `LITELLM_EXTRACT_LANGUAGE_MODEL` | Optional[String] | openai/gpt-4.1-mini | LiteLLM 提取模型名称 | +| `LITELLM_EXTRACT_TOKENS` | Integer | 256 | 提取最大令牌数 | +| `LITELLM_TEXT2GQL_API_KEY` | Optional[String] | - | LiteLLM 文本转 GQL API 密钥 | +| `LITELLM_TEXT2GQL_API_BASE` | Optional[String] | - | LiteLLM 文本转 GQL API 基础 URL | +| `LITELLM_TEXT2GQL_LANGUAGE_MODEL` | Optional[String] | openai/gpt-4.1-mini | LiteLLM 文本转 GQL 模型名称 | +| `LITELLM_TEXT2GQL_TOKENS` | Integer | 4096 | 文本转 GQL 最大令牌数 | +| `LITELLM_EMBEDDING_API_KEY` | Optional[String] | - | LiteLLM 嵌入 API 密钥 | +| `LITELLM_EMBEDDING_API_BASE` | Optional[String] | - | LiteLLM 嵌入 API 基础 URL | +| `LITELLM_EMBEDDING_MODEL` | Optional[String] | openai/text-embedding-3-small | LiteLLM 嵌入模型名称 | + +### 重排序配置 + +| 配置项 | 类型 | 默认值 | 说明 | +|--------------------|------------------|----------------------------------|--------------------| +| `COHERE_BASE_URL` | Optional[String] | https://api.cohere.com/v1/rerank | Cohere 重排序 API URL | +| `RERANKER_API_KEY` | Optional[String] | - | 重排序 API 密钥 | +| `RERANKER_MODEL` | Optional[String] | - | 重排序模型名称 | + +### HugeGraph 数据库配置 + +| 配置项 | 类型 | 默认值 | 说明 | +|------------------------|-------------------|----------------|--------------------| +| `GRAPH_URL` | Optional[String] | 127.0.0.1:8080 | HugeGraph 服务器地址 | +| `GRAPH_NAME` | Optional[String] | hugegraph | 图数据库名称 | +| `GRAPH_USER` | Optional[String] | admin | 数据库用户名 | +| `GRAPH_PWD` | Optional[String] | xxx | 数据库密码 | +| `GRAPH_SPACE` | Optional[String] | - | 图空间名称(可选) | +| `LIMIT_PROPERTY` | Optional[String] | "False" | 是否限制属性(注意:这是字符串类型) | +| `MAX_GRAPH_PATH` | Optional[Integer] | 10 | 最大图路径长度 | +| `MAX_GRAPH_ITEMS` | Optional[Integer] | 30 | 最大图项目数 | +| `EDGE_LIMIT_PRE_LABEL` | Optional[Integer] | 8 | 每个标签的边数限制 | +| `VECTOR_DIS_THRESHOLD` | Optional[Float] | 0.9 | 向量距离阈值 | +| `TOPK_PER_KEYWORD` | Optional[Integer] | 1 | 每个关键词返回的 TopK 数量 | +| `TOPK_RETURN_RESULTS` | Optional[Integer] | 20 | 返回结果数量 | + +### 管理员配置 + +| 配置项 | 类型 | 默认值 | 说明 | +|----------------|------------------|---------|--------------------| +| `ENABLE_LOGIN` | Optional[String] | "False" | 是否启用登录(注意:这是字符串类型) | +| `USER_TOKEN` | Optional[String] | 4321 | 用户令牌 | +| `ADMIN_TOKEN` | Optional[String] | xxxx | 管理员令牌 | + +## 配置使用示例 + +### 1. 基础配置示例 + +```properties +# 基础设置 +LANGUAGE=EN +CHAT_LLM_TYPE=openai +EXTRACT_LLM_TYPE=openai +TEXT2GQL_LLM_TYPE=openai +EMBEDDING_TYPE=openai + +# OpenAI 配置 +OPENAI_CHAT_API_KEY=your-openai-api-key +OPENAI_CHAT_LANGUAGE_MODEL=gpt-4.1-mini +OPENAI_EMBEDDING_API_KEY=your-openai-embedding-key +OPENAI_EMBEDDING_MODEL=text-embedding-3-small + +# HugeGraph 配置 +GRAPH_URL=127.0.0.1:8080 +GRAPH_NAME=hugegraph +GRAPH_USER=admin +GRAPH_PWD=your-password +``` + +### 2. 使用 Ollama 的配置示例 + +```properties +# 使用 Ollama +CHAT_LLM_TYPE=ollama/local +EXTRACT_LLM_TYPE=ollama/local +TEXT2GQL_LLM_TYPE=ollama/local +EMBEDDING_TYPE=ollama/local + +# Ollama 模型配置 +OLLAMA_CHAT_LANGUAGE_MODEL=llama2 +OLLAMA_EXTRACT_LANGUAGE_MODEL=llama2 +OLLAMA_TEXT2GQL_LANGUAGE_MODEL=llama2 +OLLAMA_EMBEDDING_MODEL=nomic-embed-text + +# Ollama 服务配置(如果需要自定义) +OLLAMA_CHAT_HOST=127.0.0.1 +OLLAMA_CHAT_PORT=11434 +OLLAMA_EXTRACT_HOST=127.0.0.1 +OLLAMA_EXTRACT_PORT=11434 +OLLAMA_TEXT2GQL_HOST=127.0.0.1 +OLLAMA_TEXT2GQL_PORT=11434 +OLLAMA_EMBEDDING_HOST=127.0.0.1 +OLLAMA_EMBEDDING_PORT=11434 +``` + +### 3. 代码中使用配置 + +```python +from hugegraph_llm.config import llm_settings, huge_settings + +# 使用 LLM 配置 +print(f"当前语言: {llm_settings.language}") +print(f"聊天模型类型: {llm_settings.chat_llm_type}") + +# 使用图数据库配置 +print(f"图数据库地址: {huge_settings.graph_url}") +print(f"数据库名称: {huge_settings.graph_name}") +``` + +或者直接导入配置类: + +```python +from hugegraph_llm.config.llm_config import LLMConfig +from hugegraph_llm.config.hugegraph_config import HugeGraphConfig + +# 创建配置实例 +llm_config = LLMConfig() +graph_config = HugeGraphConfig() + +print(f"当前语言: {llm_config.language}") +print(f"聊天模型类型: {llm_config.chat_llm_type}") +print(f"图数据库地址: {graph_config.graph_url}") +print(f"数据库名称: {graph_config.graph_name}") +``` + +## 注意事项 + +1. **安全性**:`.env` 文件包含敏感信息(如 API 密钥),请勿将其提交到版本控制系统 +2. **配置同步**:修改配置后,系统会自动同步到 `.env` 文件 +3. **语言切换**:修改 `LANGUAGE` 配置后需要重启应用程序才能生效 +4. **模型兼容性**:确保所选的模型与你的使用场景兼容 +5. **资源限制**:根据你的硬件资源调整 `MAX_GRAPH_ITEMS`、`EDGE_LIMIT_PRE_LABEL` 等参数 +6. **类型注意**: + - `LIMIT_PROPERTY` 和 `ENABLE_LOGIN` 是字符串类型(\"False\"/\"True\"),不是布尔类型 + - `LANGUAGE`、`CHAT_LLM_TYPE` 等字段使用 Literal 类型限制可选值 + - 大部分字段都是 Optional 类型,支持 None 值,表示未设置 +7. **环境变量 Fallback**: + - OpenAI 配置支持 `OPENAI_BASE_URL` 和 `OPENAI_API_KEY` 环境变量作为 fallback + - OpenAI Embedding 支持独立的环境变量 `OPENAI_EMBEDDING_BASE_URL` 和 `OPENAI_EMBEDDING_API_KEY` + - Cohere 支持 `CO_API_URL` 环境变量 +8. **Ollama 配置完整性**: + - 每个 LLM 类型(chat、extract、text2gql)都有对应的 `*_LANGUAGE_MODEL` 配置项 + - 每个服务类型都有独立的 host 和 port 配置,支持分布式部署 + +## 配置文件位置 + +### 系统配置(.env 文件) + +- **主配置文件**:`hugegraph-llm/.env` +- **管理范围**: + - LLMConfig:语言、LLM 提供商配置、API 密钥等 + - HugeGraphConfig:数据库连接、查询限制等 + - AdminConfig:登录设置、令牌等 + +### 提示词配置(YAML 文件) + +- **配置文件**:`src/hugegraph_llm/resources/demo/config_prompt.yaml` +- **管理范围**: + - PromptConfig:所有提示词模板、图谱模式等 + - **注意**:这些配置不存储在 .env 文件中 + +### 配置类定义 + +- **位置**:`hugegraph-llm/src/hugegraph_llm/config/` +- **基类**: + - BaseConfig:用于 .env 文件管理的配置类 + - BasePromptConfig:用于 YAML 文件管理的提示词配置类 +- **UI 配置管理**:`src/hugegraph_llm/demo/rag_demo/configs_block.py` + - Gradio 界面的配置管理组件 + +### 部署配置文件 + +- **Docker 环境模板**:`docker/env.template` + - 用于 Docker 容器部署的环境变量模板 +- **Helm Chart 配置**:`docker/charts/hg-llm/values.yaml` + - Kubernetes 部署配置,包含副本数、镜像、服务等设置 + +### 项目配置文件 + +- **Python 包配置**:`pyproject.toml` + - 项目依赖、构建系统和包管理配置 +- **JSON 示例文件**:`resources/` 目录下的各种 JSON 文件 + - 包含示例数据、查询样本等 + +## 相关文档 + +- [HugeGraph LLM README](README.md) diff --git a/hugegraph-llm/src/hugegraph_llm/config/__init__.py b/hugegraph-llm/src/hugegraph_llm/config/__init__.py index bdd07e95..5d4f5782 100644 --- a/hugegraph-llm/src/hugegraph_llm/config/__init__.py +++ b/hugegraph-llm/src/hugegraph_llm/config/__init__.py @@ -25,12 +25,12 @@ from .hugegraph_config import HugeGraphConfig from .admin_config import AdminConfig from .llm_config import LLMConfig -prompt = PromptConfig() +llm_settings = LLMConfig() +prompt = PromptConfig(llm_settings) prompt.ensure_yaml_file_exists() huge_settings = HugeGraphConfig() admin_settings = AdminConfig() -llm_settings = LLMConfig() package_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) resource_path = os.path.join(package_path, "resources") diff --git a/hugegraph-llm/src/hugegraph_llm/config/generate.py b/hugegraph-llm/src/hugegraph_llm/config/generate.py index ec7e2c28..36910e48 100644 --- a/hugegraph-llm/src/hugegraph_llm/config/generate.py +++ b/hugegraph-llm/src/hugegraph_llm/config/generate.py @@ -28,4 +28,4 @@ if __name__ == "__main__": huge_settings.generate_env() admin_settings.generate_env() llm_settings.generate_env() - PromptConfig().generate_yaml_file() + PromptConfig(llm_settings).generate_yaml_file() diff --git a/hugegraph-llm/src/hugegraph_llm/config/llm_config.py b/hugegraph-llm/src/hugegraph_llm/config/llm_config.py index ff738454..b2029d98 100644 --- a/hugegraph-llm/src/hugegraph_llm/config/llm_config.py +++ b/hugegraph-llm/src/hugegraph_llm/config/llm_config.py @@ -24,7 +24,7 @@ from .models import BaseConfig class LLMConfig(BaseConfig): """LLM settings""" - + language: Literal["EN", "CN"] = "EN" chat_llm_type: Literal["openai", "litellm", "ollama/local"] = "openai" extract_llm_type: Literal["openai", "litellm", "ollama/local"] = "openai" text2gql_llm_type: Literal["openai", "litellm", "ollama/local"] = "openai" diff --git a/hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py b/hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py index 23832bf9..7af1ef92 100644 --- a/hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py +++ b/hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py @@ -29,6 +29,14 @@ F_NAME = "config_prompt.yaml" yaml_file_path = os.path.join(os.getcwd(), "src/hugegraph_llm/resources/demo", F_NAME) +class LiteralStr(str): + pass + +def literal_str_representer(dumper, data): + return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|') + +yaml.add_representer(LiteralStr, literal_str_representer) + class BasePromptConfig: graph_schema: str = "" extract_graph_prompt: str = "" @@ -39,6 +47,7 @@ class BasePromptConfig: text2gql_graph_schema: str = "" gremlin_generate_prompt: str = "" doc_input_text: str = "" + _language_generated: str = "" generate_extract_prompt_template: str = "" def ensure_yaml_file_exists(self): @@ -63,80 +72,54 @@ class BasePromptConfig: # Load existing values from the YAML file into the class attributes for key, value in data.items(): setattr(self, key, value) + + # Check if the language in the .env file matches the language in the YAML file + env_lang = (self.llm_settings.language.lower() + if hasattr(self, 'llm_settings') and self.llm_settings.language + else 'en') + yaml_lang = data.get('_language_generated', 'en').lower() + if env_lang.strip() != yaml_lang.strip(): + log.warning( + "Prompt was changed '.env' language is '%s', " + "but '%s' was generated for '%s'. " + "Regenerating the prompt file...", + env_lang, F_NAME, yaml_lang + ) + if self.llm_settings.language.lower() == "cn": + self.answer_prompt = self.answer_prompt_CN + self.extract_graph_prompt = self.extract_graph_prompt_CN + self.gremlin_generate_prompt = self.gremlin_generate_prompt_CN + self.keywords_extract_prompt = self.keywords_extract_prompt_CN + self.doc_input_text = self.doc_input_text_CN + else: + self.answer_prompt = self.answer_prompt_EN + self.extract_graph_prompt = self.extract_graph_prompt_EN + self.gremlin_generate_prompt = self.gremlin_generate_prompt_EN + self.keywords_extract_prompt = self.keywords_extract_prompt_EN + self.doc_input_text = self.doc_input_text_EN else: self.generate_yaml_file() log.info("Prompt file '%s' doesn't exist, create it.", yaml_file_path) def save_to_yaml(self): - indented_schema = "\n".join( - [f" {line}" for line in self.graph_schema.splitlines()] - ) - indented_text2gql_schema = "\n".join( - [f" {line}" for line in self.text2gql_graph_schema.splitlines()] - ) - indented_gremlin_prompt = "\n".join( - [f" {line}" for line in self.gremlin_generate_prompt.splitlines()] - ) - indented_example_prompt = "\n".join( - [f" {line}" for line in self.extract_graph_prompt.splitlines()] - ) - indented_question = "\n".join( - [f" {line}" for line in self.default_question.splitlines()] - ) - indented_custom_related_information = "\n".join( - [f" {line}" for line in self.custom_rerank_info.splitlines()] - ) - indented_default_answer_template = "\n".join( - [f" {line}" for line in self.answer_prompt.splitlines()] - ) - indented_keywords_extract_template = "\n".join( - [f" {line}" for line in self.keywords_extract_prompt.splitlines()] - ) - indented_doc_input_text = "\n".join( - [f" {line}" for line in self.doc_input_text.splitlines()] - ) - indented_generate_extract_prompt = ( - "\n".join( - [ - f" {line}" - for line in self.generate_extract_prompt_template.splitlines() - ] - ) - + "\n" - ) - # This can be extended to add storage fields according to the data needs to be stored - yaml_content = f"""graph_schema: | -{indented_schema} - -text2gql_graph_schema: | -{indented_text2gql_schema} - -extract_graph_prompt: | -{indented_example_prompt} - -default_question: | -{indented_question} -custom_rerank_info: | -{indented_custom_related_information} - -answer_prompt: | -{indented_default_answer_template} - -keywords_extract_prompt: | -{indented_keywords_extract_template} - -gremlin_generate_prompt: | -{indented_gremlin_prompt} - -doc_input_text: | -{indented_doc_input_text} - -generate_extract_prompt_template: | -{indented_generate_extract_prompt} -""" + def to_literal(val): + return LiteralStr(val) if isinstance(val, str) else val + data = { + "graph_schema": to_literal(self.graph_schema), + "text2gql_graph_schema": to_literal(self.text2gql_graph_schema), + "extract_graph_prompt": to_literal(self.extract_graph_prompt), + "default_question": to_literal(self.default_question), + "custom_rerank_info": to_literal(self.custom_rerank_info), + "answer_prompt": to_literal(self.answer_prompt), + "keywords_extract_prompt": to_literal(self.keywords_extract_prompt), + "gremlin_generate_prompt": to_literal(self.gremlin_generate_prompt), + "doc_input_text": to_literal(self.doc_input_text), + "_language_generated": str(self.llm_settings.language).lower().strip(), + "generate_extract_prompt_template": to_literal(self.generate_extract_prompt_template), + } with open(yaml_file_path, "w", encoding="utf-8") as file: - file.write(yaml_content) + yaml.dump(data, file, allow_unicode=True, sort_keys=False, default_flow_style=False) def generate_yaml_file(self): if os.path.exists(yaml_file_path): @@ -147,10 +130,21 @@ generate_extract_prompt_template: | update = input() if update.lower() != "y": return - self.save_to_yaml() + + if self.llm_settings.language.lower() == "cn": + self.answer_prompt = self.answer_prompt_CN + self.extract_graph_prompt = self.extract_graph_prompt_CN + self.gremlin_generate_prompt = self.gremlin_generate_prompt_CN + self.keywords_extract_prompt = self.keywords_extract_prompt_CN + self.doc_input_text = self.doc_input_text_CN else: - self.save_to_yaml() - log.info("Prompt file '%s' doesn't exist, create it.", yaml_file_path) + self.answer_prompt = self.answer_prompt_EN + self.extract_graph_prompt = self.extract_graph_prompt_EN + self.gremlin_generate_prompt = self.gremlin_generate_prompt_EN + self.keywords_extract_prompt = self.keywords_extract_prompt_EN + self.doc_input_text = self.doc_input_text_EN + self.save_to_yaml() + log.info("Prompt file '%s' has been generated with default values.", yaml_file_path) def update_yaml_file(self): self.save_to_yaml() diff --git a/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py b/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py index abc1c185..cc92d2fe 100644 --- a/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py +++ b/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py @@ -21,8 +21,10 @@ from hugegraph_llm.config.models.base_prompt_config import BasePromptConfig # pylint: disable=C0301 class PromptConfig(BasePromptConfig): + def __init__(self, llm_config_object): + self.llm_settings = llm_config_object # Data is detached from llm_op/answer_synthesize.py - answer_prompt: str = """You are an expert in the fields of knowledge graphs and natural language processing. + answer_prompt_EN: str = """You are an expert in the fields of knowledge graphs and natural language processing. Please provide precise and accurate answers based on the following context information, which is sorted in order of importance from high to low, without using any fabricated knowledge. @@ -43,7 +45,7 @@ Answer: default_question: str = """Who is Sarah ?""" # Note: Users should modify the prompt(examples) according to the real schema and text (property_graph_extract.py) - extract_graph_prompt: str = """## Main Task + extract_graph_prompt_EN: str = """## Main Task Given the following graph schema and a piece of text, your task is to analyze the text and extract information that fits into the schema's structure, formatting the information into vertices and edges as specified. ## Basic Rules: @@ -154,7 +156,7 @@ Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a text2gql_graph_schema: str = "hugegraph" # Extracted from llm_op/keyword_extract.py - keywords_extract_prompt: str = """Instructions: + keywords_extract_prompt_EN: str = """Instructions: Please perform the following tasks on the text below: 1. Extract keywords from the text: - Minimum 0, maximum MAX_KEYWORDS keywords. @@ -186,7 +188,7 @@ Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a {question} """ - gremlin_generate_prompt = """ + gremlin_generate_prompt_EN = """ You are an expert in graph query language (Gremlin). Your role is to understand the schema of the graph, recognize the intent behind user queries, and generate accurate Gremlin code based on the given instructions. ### Tasks @@ -239,7 +241,7 @@ Generate Gremlin from the Following User Query: The generated Gremlin is: """ - doc_input_text: str = """Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a home with since 2010. + doc_input_text_EN: str = """Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a home with since 2010. James, in his professional life, works as a journalist. Additionally, Sarah is the proud owner of the website www.sarahsplace.com, while James manages his own webpage, though the specific URL is not mentioned here. These two individuals, Sarah and James, have not only forged a strong personal bond as roommates but have also @@ -247,7 +249,6 @@ carved out their distinctive digital presence through their respective webpages, and experiences. """ - # TODO: we should switch the prompt automatically based on the language (like using context['language']) answer_prompt_CN: str = """你是知识图谱和自然语言处理领域的专家。 你的任务是基于给定的上下文提供精确和准确的答案。 @@ -424,4 +425,6 @@ Your goal is to generate a new, tailored "Graph Extract Prompt Header" based on {user_scenario} ## Your Generated "Graph Extract Prompt Header": +## Language Requirement: +Please generate the prompt in {language} language. """ diff --git a/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py b/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py index c451c65c..ec8adeab 100644 --- a/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py +++ b/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py @@ -32,10 +32,14 @@ from hugegraph_llm.demo.rag_demo.configs_block import ( apply_reranker_config, apply_graph_config, ) +from hugegraph_llm.demo.rag_demo.configs_block import get_header_with_language_indicator from hugegraph_llm.demo.rag_demo.other_block import create_other_block from hugegraph_llm.demo.rag_demo.other_block import lifespan from hugegraph_llm.demo.rag_demo.rag_block import create_rag_block, rag_answer -from hugegraph_llm.demo.rag_demo.text2gremlin_block import create_text2gremlin_block, graph_rag_recall +from hugegraph_llm.demo.rag_demo.text2gremlin_block import ( + create_text2gremlin_block, + graph_rag_recall, +) from hugegraph_llm.demo.rag_demo.vector_graph_block import create_vector_graph_block from hugegraph_llm.resources.demo.css import CSS from hugegraph_llm.utils.log import log @@ -62,7 +66,7 @@ def init_rag_ui() -> gr.Interface: title="HugeGraph RAG Platform", css=CSS, ) as hugegraph_llm_ui: - gr.Markdown("# HugeGraph RAG Platform 🚀") + gr.HTML(value=get_header_with_language_indicator(prompt.llm_settings.language)) """ TODO: leave a general idea of the unresolved part @@ -88,7 +92,9 @@ def init_rag_ui() -> gr.Interface: textbox_array_graph_config = create_configs_block() with gr.Tab(label="1. Build RAG Index 💡"): - textbox_input_text, textbox_input_schema, textbox_info_extract_template = create_vector_graph_block() + textbox_input_text, textbox_input_schema, textbox_info_extract_template = ( + create_vector_graph_block() + ) with gr.Tab(label="2. (Graph)RAG & User Functions 📖"): ( textbox_inp, @@ -97,7 +103,9 @@ def init_rag_ui() -> gr.Interface: textbox_custom_related_information, ) = create_rag_block() with gr.Tab(label="3. Text2gremlin ⚙️"): - textbox_gremlin_inp, textbox_gremlin_schema, textbox_gremlin_prompt = create_text2gremlin_block() + textbox_gremlin_inp, textbox_gremlin_schema, textbox_gremlin_prompt = ( + create_text2gremlin_block() + ) with gr.Tab(label="4. Graph Tools 🚧"): create_other_block() with gr.Tab(label="5. Admin Tools 🛠"): @@ -123,7 +131,7 @@ def init_rag_ui() -> gr.Interface: prompt.custom_rerank_info, prompt.default_question, huge_settings.graph_name, - prompt.gremlin_generate_prompt + prompt.gremlin_generate_prompt, ) hugegraph_llm_ui.load( # pylint: disable=E1101 @@ -156,7 +164,9 @@ def create_app(): # settings.check_env() prompt.update_yaml_file() auth_enabled = admin_settings.enable_login.lower() == "true" - log.info("(Status) Authentication is %s now.", "enabled" if auth_enabled else "disabled") + log.info( + "(Status) Authentication is %s now.", "enabled" if auth_enabled else "disabled" + ) api_auth = APIRouter(dependencies=[Depends(authenticate)] if auth_enabled else []) hugegraph_llm = init_rag_ui() @@ -176,7 +186,10 @@ def create_app(): # Mount Gradio inside FastAPI # TODO: support multi-user login when need app = gr.mount_gradio_app( - app, hugegraph_llm, path="/", auth=("rag", admin_settings.user_token) if auth_enabled else None + app, + hugegraph_llm, + path="/", + auth=("rag", admin_settings.user_token) if auth_enabled else None, ) return app @@ -188,4 +201,10 @@ if __name__ == "__main__": parser.add_argument("--port", type=int, default=8001, help="port") args = parser.parse_args() - uvicorn.run("hugegraph_llm.demo.rag_demo.app:create_app", host=args.host, port=args.port, factory=True, reload=True) + uvicorn.run( + "hugegraph_llm.demo.rag_demo.app:create_app", + host=args.host, + port=args.port, + factory=True, + reload=True, + ) diff --git a/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/configs_block.py b/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/configs_block.py index f872ff52..01ea24aa 100644 --- a/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/configs_block.py +++ b/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/configs_block.py @@ -64,14 +64,25 @@ def test_litellm_chat(api_key, api_base, model_name, max_tokens: int) -> int: return 200 -def test_api_connection(url, method="GET", headers=None, params=None, body=None, auth=None, origin_call=None) -> int: +def test_api_connection( + url, method="GET", headers=None, params=None, body=None, auth=None, origin_call=None +) -> int: # TODO: use fastapi.request / starlette instead? log.debug("Request URL: %s", url) try: if method.upper() == "GET": - resp = requests.get(url, headers=headers, params=params, timeout=(1.0, 5.0), auth=auth) + resp = requests.get( + url, headers=headers, params=params, timeout=(1.0, 5.0), auth=auth + ) elif method.upper() == "POST": - resp = requests.post(url, headers=headers, params=params, json=body, timeout=(1.0, 5.0), auth=auth) + resp = requests.post( + url, + headers=headers, + params=params, + json=body, + timeout=(1.0, 5.0), + auth=auth, + ) else: raise ValueError("Unsupported HTTP method, please use GET/POST instead") except requests.exceptions.RequestException as e: @@ -107,12 +118,16 @@ def apply_embedding_config(arg1, arg2, arg3, origin_call=None) -> int: test_url = llm_settings.openai_embedding_api_base + "/embeddings" headers = {"Authorization": f"Bearer {arg1}"} data = {"model": arg3, "input": "test"} - status_code = test_api_connection(test_url, method="POST", headers=headers, body=data, origin_call=origin_call) + status_code = test_api_connection( + test_url, method="POST", headers=headers, body=data, origin_call=origin_call + ) elif embedding_option == "ollama/local": llm_settings.ollama_embedding_host = arg1 llm_settings.ollama_embedding_port = int(arg2) llm_settings.ollama_embedding_model = arg3 - status_code = test_api_connection(f"http://{arg1}:{arg2}", origin_call=origin_call) + status_code = test_api_connection( + f"http://{arg1}:{arg2}", origin_call=origin_call + ) elif embedding_option == "litellm": llm_settings.litellm_embedding_api_key = arg1 llm_settings.litellm_embedding_api_base = arg2 @@ -163,7 +178,7 @@ def apply_reranker_config( def apply_graph_config(url, name, user, pwd, gs, origin_call=None) -> int: # Add URL prefix automatically to improve the user experience - if url and not (url.startswith('http://') or url.startswith('https://')): + if url and not (url.startswith("http://") or url.startswith("https://")): url = f"http://{url}" huge_settings.graph_url = url @@ -183,8 +198,14 @@ def apply_graph_config(url, name, user, pwd, gs, origin_call=None) -> int: return response -def apply_llm_config(current_llm_config, api_key_or_host, api_base_or_port, model_name, max_tokens, - origin_call=None) -> int: +def apply_llm_config( + current_llm_config, + api_key_or_host, + api_base_or_port, + model_name, + max_tokens, + origin_call=None, +) -> int: log.debug("current llm in apply_llm_config is %s", current_llm_config) llm_option = getattr(llm_settings, f"{current_llm_config}_llm_type") log.debug("llm option in apply_llm_config is %s", llm_option) @@ -196,28 +217,43 @@ def apply_llm_config(current_llm_config, api_key_or_host, api_base_or_port, mode setattr(llm_settings, f"openai_{current_llm_config}_language_model", model_name) setattr(llm_settings, f"openai_{current_llm_config}_tokens", int(max_tokens)) - test_url = getattr(llm_settings, f"openai_{current_llm_config}_api_base") + "/chat/completions" + test_url = ( + getattr(llm_settings, f"openai_{current_llm_config}_api_base") + + "/chat/completions" + ) data = { "model": model_name, "temperature": 0.01, "messages": [{"role": "user", "content": "test"}], } headers = {"Authorization": f"Bearer {api_key_or_host}"} - status_code = test_api_connection(test_url, method="POST", headers=headers, body=data, origin_call=origin_call) + status_code = test_api_connection( + test_url, method="POST", headers=headers, body=data, origin_call=origin_call + ) elif llm_option == "ollama/local": setattr(llm_settings, f"ollama_{current_llm_config}_host", api_key_or_host) - setattr(llm_settings, f"ollama_{current_llm_config}_port", int(api_base_or_port)) + setattr( + llm_settings, f"ollama_{current_llm_config}_port", int(api_base_or_port) + ) setattr(llm_settings, f"ollama_{current_llm_config}_language_model", model_name) - status_code = test_api_connection(f"http://{api_key_or_host}:{api_base_or_port}", origin_call=origin_call) + status_code = test_api_connection( + f"http://{api_key_or_host}:{api_base_or_port}", origin_call=origin_call + ) elif llm_option == "litellm": setattr(llm_settings, f"litellm_{current_llm_config}_api_key", api_key_or_host) - setattr(llm_settings, f"litellm_{current_llm_config}_api_base", api_base_or_port) - setattr(llm_settings, f"litellm_{current_llm_config}_language_model", model_name) + setattr( + llm_settings, f"litellm_{current_llm_config}_api_base", api_base_or_port + ) + setattr( + llm_settings, f"litellm_{current_llm_config}_language_model", model_name + ) setattr(llm_settings, f"litellm_{current_llm_config}_tokens", int(max_tokens)) - status_code = test_litellm_chat(api_key_or_host, api_base_or_port, model_name, int(max_tokens)) + status_code = test_litellm_chat( + api_key_or_host, api_base_or_port, model_name, int(max_tokens) + ) gr.Info("Configured!") llm_settings.update_env() @@ -231,27 +267,48 @@ def create_configs_block() -> list: with gr.Accordion("1. Set up the HugeGraph server.", open=False): with gr.Row(): graph_config_input = [ - gr.Textbox(value=huge_settings.graph_url, label="url", - info="IP:PORT (e.g. 127.0.0.1:8080) or full URL (e.g. http://127.0.0.1:8080)"), - gr.Textbox(value=huge_settings.graph_name, label="graph", - info="The graph name of HugeGraph-Server instance"), - gr.Textbox(value=huge_settings.graph_user, label="user", - info="Username for graph server auth"), - gr.Textbox(value=huge_settings.graph_pwd, label="pwd", type="password", - info="Password for graph server auth"), - gr.Textbox(value=huge_settings.graph_space, label="graphspace (Optional)", - info="Namespace for multi-tenant scenarios (leave empty if not using graphspaces)"), + gr.Textbox( + value=huge_settings.graph_url, + label="url", + info="IP:PORT (e.g. 127.0.0.1:8080) or full URL (e.g. http://127.0.0.1:8080)", + ), + gr.Textbox( + value=huge_settings.graph_name, + label="graph", + info="The graph name of HugeGraph-Server instance", + ), + gr.Textbox( + value=huge_settings.graph_user, + label="user", + info="Username for graph server auth", + ), + gr.Textbox( + value=huge_settings.graph_pwd, + label="pwd", + type="password", + info="Password for graph server auth", + ), + gr.Textbox( + value=huge_settings.graph_space, + label="graphspace (Optional)", + info="Namespace for multi-tenant scenarios (leave empty if not using graphspaces)", + ), ] graph_config_button = gr.Button("Apply Configuration") graph_config_button.click(apply_graph_config, inputs=graph_config_input) # pylint: disable=no-member # TODO : use OOP to refactor the following code with gr.Accordion("2. Set up the LLM.", open=False): - gr.Markdown("> Tips: The OpenAI option also support openai style api from other providers. " - "**Refresh the page** to load the **latest configs** in __UI__.") - with gr.Tab(label='chat'): - chat_llm_dropdown = gr.Dropdown(choices=["openai", "litellm", "ollama/local"], - value=getattr(llm_settings, "chat_llm_type"), label="type") + gr.Markdown( + "> Tips: The OpenAI option also support openai style api from other providers. " + "**Refresh the page** to load the **latest configs** in __UI__." + ) + with gr.Tab(label="chat"): + chat_llm_dropdown = gr.Dropdown( + choices=["openai", "litellm", "ollama/local"], + value=getattr(llm_settings, "chat_llm_type"), + label="type", + ) apply_llm_config_with_chat_op = partial(apply_llm_config, "chat") @gr.render(inputs=[chat_llm_dropdown]) @@ -259,45 +316,92 @@ def create_configs_block() -> list: llm_settings.chat_llm_type = llm_type if llm_type == "openai": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "openai_chat_api_key"), label="api_key", - type="password"), - gr.Textbox(value=getattr(llm_settings, "openai_chat_api_base"), label="api_base"), - gr.Textbox(value=getattr(llm_settings, "openai_chat_language_model"), label="model_name"), - gr.Textbox(value=getattr(llm_settings, "openai_chat_tokens"), label="max_token"), + gr.Textbox( + value=getattr(llm_settings, "openai_chat_api_key"), + label="api_key", + type="password", + ), + gr.Textbox( + value=getattr(llm_settings, "openai_chat_api_base"), + label="api_base", + ), + gr.Textbox( + value=getattr(llm_settings, "openai_chat_language_model"), + label="model_name", + ), + gr.Textbox( + value=getattr(llm_settings, "openai_chat_tokens"), + label="max_token", + ), ] elif llm_type == "ollama/local": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "ollama_chat_host"), label="host"), - gr.Textbox(value=str(getattr(llm_settings, "ollama_chat_port")), label="port"), - gr.Textbox(value=getattr(llm_settings, "ollama_chat_language_model"), label="model_name"), + gr.Textbox( + value=getattr(llm_settings, "ollama_chat_host"), + label="host", + ), + gr.Textbox( + value=str(getattr(llm_settings, "ollama_chat_port")), + label="port", + ), + gr.Textbox( + value=getattr(llm_settings, "ollama_chat_language_model"), + label="model_name", + ), gr.Textbox(value="", visible=False), ] elif llm_type == "litellm": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "litellm_chat_api_key"), label="api_key", - type="password"), - gr.Textbox(value=getattr(llm_settings, "litellm_chat_api_base"), label="api_base", - info="If you want to use the default api_base, please keep it blank"), - gr.Textbox(value=getattr(llm_settings, "litellm_chat_language_model"), label="model_name", - info="Please refer to https://docs.litellm.ai/docs/providers"), - gr.Textbox(value=getattr(llm_settings, "litellm_chat_tokens"), label="max_token"), + gr.Textbox( + value=getattr(llm_settings, "litellm_chat_api_key"), + label="api_key", + type="password", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_chat_api_base"), + label="api_base", + info="If you want to use the default api_base, please keep it blank", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_chat_language_model"), + label="model_name", + info="Please refer to https://docs.litellm.ai/docs/providers", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_chat_tokens"), + label="max_token", + ), ] else: - llm_config_input = [gr.Textbox(value="", visible=False) for _ in range(4)] + llm_config_input = [ + gr.Textbox(value="", visible=False) for _ in range(4) + ] llm_config_button = gr.Button("Apply configuration") - llm_config_button.click(apply_llm_config_with_chat_op, inputs=llm_config_input) + llm_config_button.click( + apply_llm_config_with_chat_op, inputs=llm_config_input + ) # Determine whether there are Settings in the.env file - env_path = os.path.join(os.getcwd(), ".env") # Load .env from the current working directory + env_path = os.path.join( + os.getcwd(), ".env" + ) # Load .env from the current working directory env_vars = dotenv_values(env_path) api_extract_key = env_vars.get("OPENAI_EXTRACT_API_KEY") api_text2sql_key = env_vars.get("OPENAI_TEXT2GQL_API_KEY") if not api_extract_key: - llm_config_button.click(apply_llm_config_with_text2gql_op, inputs=llm_config_input) + llm_config_button.click( + apply_llm_config_with_text2gql_op, inputs=llm_config_input + ) if not api_text2sql_key: - llm_config_button.click(apply_llm_config_with_extract_op, inputs=llm_config_input) - with gr.Tab(label='mini_tasks'): - extract_llm_dropdown = gr.Dropdown(choices=["openai", "litellm", "ollama/local"], - value=getattr(llm_settings, "extract_llm_type"), label="type") + llm_config_button.click( + apply_llm_config_with_extract_op, inputs=llm_config_input + ) + + with gr.Tab(label="mini_tasks"): + extract_llm_dropdown = gr.Dropdown( + choices=["openai", "litellm", "ollama/local"], + value=getattr(llm_settings, "extract_llm_type"), + label="type", + ) apply_llm_config_with_extract_op = partial(apply_llm_config, "extract") @gr.render(inputs=[extract_llm_dropdown]) @@ -305,36 +409,83 @@ def create_configs_block() -> list: llm_settings.extract_llm_type = llm_type if llm_type == "openai": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "openai_extract_api_key"), label="api_key", - type="password"), - gr.Textbox(value=getattr(llm_settings, "openai_extract_api_base"), label="api_base"), - gr.Textbox(value=getattr(llm_settings, "openai_extract_language_model"), label="model_name"), - gr.Textbox(value=getattr(llm_settings, "openai_extract_tokens"), label="max_token"), + gr.Textbox( + value=getattr(llm_settings, "openai_extract_api_key"), + label="api_key", + type="password", + ), + gr.Textbox( + value=getattr(llm_settings, "openai_extract_api_base"), + label="api_base", + ), + gr.Textbox( + value=getattr( + llm_settings, "openai_extract_language_model" + ), + label="model_name", + ), + gr.Textbox( + value=getattr(llm_settings, "openai_extract_tokens"), + label="max_token", + ), ] elif llm_type == "ollama/local": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "ollama_extract_host"), label="host"), - gr.Textbox(value=str(getattr(llm_settings, "ollama_extract_port")), label="port"), - gr.Textbox(value=getattr(llm_settings, "ollama_extract_language_model"), label="model_name"), + gr.Textbox( + value=getattr(llm_settings, "ollama_extract_host"), + label="host", + ), + gr.Textbox( + value=str(getattr(llm_settings, "ollama_extract_port")), + label="port", + ), + gr.Textbox( + value=getattr( + llm_settings, "ollama_extract_language_model" + ), + label="model_name", + ), gr.Textbox(value="", visible=False), ] elif llm_type == "litellm": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "litellm_extract_api_key"), label="api_key", - type="password"), - gr.Textbox(value=getattr(llm_settings, "litellm_extract_api_base"), label="api_base", - info="If you want to use the default api_base, please keep it blank"), - gr.Textbox(value=getattr(llm_settings, "litellm_extract_language_model"), label="model_name", - info="Please refer to https://docs.litellm.ai/docs/providers"), - gr.Textbox(value=getattr(llm_settings, "litellm_extract_tokens"), label="max_token"), + gr.Textbox( + value=getattr(llm_settings, "litellm_extract_api_key"), + label="api_key", + type="password", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_extract_api_base"), + label="api_base", + info="If you want to use the default api_base, please keep it blank", + ), + gr.Textbox( + value=getattr( + llm_settings, "litellm_extract_language_model" + ), + label="model_name", + info="Please refer to https://docs.litellm.ai/docs/providers", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_extract_tokens"), + label="max_token", + ), ] else: - llm_config_input = [gr.Textbox(value="", visible=False) for _ in range(4)] + llm_config_input = [ + gr.Textbox(value="", visible=False) for _ in range(4) + ] llm_config_button = gr.Button("Apply configuration") - llm_config_button.click(apply_llm_config_with_extract_op, inputs=llm_config_input) - with gr.Tab(label='text2gql'): - text2gql_llm_dropdown = gr.Dropdown(choices=["openai", "litellm", "ollama/local"], - value=getattr(llm_settings, "text2gql_llm_type"), label="type") + llm_config_button.click( + apply_llm_config_with_extract_op, inputs=llm_config_input + ) + + with gr.Tab(label="text2gql"): + text2gql_llm_dropdown = gr.Dropdown( + choices=["openai", "litellm", "ollama/local"], + value=getattr(llm_settings, "text2gql_llm_type"), + label="type", + ) apply_llm_config_with_text2gql_op = partial(apply_llm_config, "text2gql") @gr.render(inputs=[text2gql_llm_dropdown]) @@ -342,38 +493,82 @@ def create_configs_block() -> list: llm_settings.text2gql_llm_type = llm_type if llm_type == "openai": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "openai_text2gql_api_key"), label="api_key", - type="password"), - gr.Textbox(value=getattr(llm_settings, "openai_text2gql_api_base"), label="api_base"), - gr.Textbox(value=getattr(llm_settings, "openai_text2gql_language_model"), label="model_name"), - gr.Textbox(value=getattr(llm_settings, "openai_text2gql_tokens"), label="max_token"), + gr.Textbox( + value=getattr(llm_settings, "openai_text2gql_api_key"), + label="api_key", + type="password", + ), + gr.Textbox( + value=getattr(llm_settings, "openai_text2gql_api_base"), + label="api_base", + ), + gr.Textbox( + value=getattr( + llm_settings, "openai_text2gql_language_model" + ), + label="model_name", + ), + gr.Textbox( + value=getattr(llm_settings, "openai_text2gql_tokens"), + label="max_token", + ), ] elif llm_type == "ollama/local": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "ollama_text2gql_host"), label="host"), - gr.Textbox(value=str(getattr(llm_settings, "ollama_text2gql_port")), label="port"), - gr.Textbox(value=getattr(llm_settings, "ollama_text2gql_language_model"), label="model_name"), + gr.Textbox( + value=getattr(llm_settings, "ollama_text2gql_host"), + label="host", + ), + gr.Textbox( + value=str(getattr(llm_settings, "ollama_text2gql_port")), + label="port", + ), + gr.Textbox( + value=getattr( + llm_settings, "ollama_text2gql_language_model" + ), + label="model_name", + ), gr.Textbox(value="", visible=False), ] elif llm_type == "litellm": llm_config_input = [ - gr.Textbox(value=getattr(llm_settings, "litellm_text2gql_api_key"), label="api_key", - type="password"), - gr.Textbox(value=getattr(llm_settings, "litellm_text2gql_api_base"), label="api_base", - info="If you want to use the default api_base, please keep it blank"), - gr.Textbox(value=getattr(llm_settings, "litellm_text2gql_language_model"), label="model_name", - info="Please refer to https://docs.litellm.ai/docs/providers"), - gr.Textbox(value=getattr(llm_settings, "litellm_text2gql_tokens"), label="max_token"), + gr.Textbox( + value=getattr(llm_settings, "litellm_text2gql_api_key"), + label="api_key", + type="password", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_text2gql_api_base"), + label="api_base", + info="If you want to use the default api_base, please keep it blank", + ), + gr.Textbox( + value=getattr( + llm_settings, "litellm_text2gql_language_model" + ), + label="model_name", + info="Please refer to https://docs.litellm.ai/docs/providers", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_text2gql_tokens"), + label="max_token", + ), ] else: - llm_config_input = [gr.Textbox(value="", visible=False) for _ in range(4)] + llm_config_input = [ + gr.Textbox(value="", visible=False) for _ in range(4) + ] llm_config_button = gr.Button("Apply configuration") - llm_config_button.click(apply_llm_config_with_text2gql_op, inputs=llm_config_input) + llm_config_button.click( + apply_llm_config_with_text2gql_op, inputs=llm_config_input + ) with gr.Accordion("3. Set up the Embedding.", open=False): embedding_dropdown = gr.Dropdown( - choices=["openai", "litellm", "ollama/local"], value=llm_settings.embedding_type, - label="Embedding" + choices=["openai", "litellm", "ollama/local"], + value=llm_settings.embedding_type, + label="Embedding", ) @gr.render(inputs=[embedding_dropdown]) @@ -382,26 +577,52 @@ def create_configs_block() -> list: if embedding_type == "openai": with gr.Row(): embedding_config_input = [ - gr.Textbox(value=llm_settings.openai_embedding_api_key, label="api_key", type="password"), - gr.Textbox(value=llm_settings.openai_embedding_api_base, label="api_base"), - gr.Textbox(value=llm_settings.openai_embedding_model, label="model_name"), + gr.Textbox( + value=llm_settings.openai_embedding_api_key, + label="api_key", + type="password", + ), + gr.Textbox( + value=llm_settings.openai_embedding_api_base, + label="api_base", + ), + gr.Textbox( + value=llm_settings.openai_embedding_model, + label="model_name", + ), ] elif embedding_type == "ollama/local": with gr.Row(): embedding_config_input = [ - gr.Textbox(value=llm_settings.ollama_embedding_host, label="host"), - gr.Textbox(value=str(llm_settings.ollama_embedding_port), label="port"), - gr.Textbox(value=llm_settings.ollama_embedding_model, label="model_name"), + gr.Textbox( + value=llm_settings.ollama_embedding_host, label="host" + ), + gr.Textbox( + value=str(llm_settings.ollama_embedding_port), label="port" + ), + gr.Textbox( + value=llm_settings.ollama_embedding_model, + label="model_name", + ), ] elif embedding_type == "litellm": with gr.Row(): embedding_config_input = [ - gr.Textbox(value=getattr(llm_settings, "litellm_embedding_api_key"), label="api_key", - type="password"), - gr.Textbox(value=getattr(llm_settings, "litellm_embedding_api_base"), label="api_base", - info="If you want to use the default api_base, please keep it blank"), - gr.Textbox(value=getattr(llm_settings, "litellm_embedding_model"), label="model_name", - info="Please refer to https://docs.litellm.ai/docs/embedding/supported_embedding"), + gr.Textbox( + value=getattr(llm_settings, "litellm_embedding_api_key"), + label="api_key", + type="password", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_embedding_api_base"), + label="api_base", + info="If you want to use the default api_base, please keep it blank", + ), + gr.Textbox( + value=getattr(llm_settings, "litellm_embedding_model"), + label="model_name", + info="Please refer to https://docs.litellm.ai/docs/embedding/supported_embedding", + ), ] else: embedding_config_input = [ @@ -427,18 +648,30 @@ def create_configs_block() -> list: @gr.render(inputs=[reranker_dropdown]) def reranker_settings(reranker_type): - llm_settings.reranker_type = reranker_type if reranker_type != "None" else None + llm_settings.reranker_type = ( + reranker_type if reranker_type != "None" else None + ) if reranker_type == "cohere": with gr.Row(): reranker_config_input = [ - gr.Textbox(value=llm_settings.reranker_api_key, label="api_key", type="password"), + gr.Textbox( + value=llm_settings.reranker_api_key, + label="api_key", + type="password", + ), gr.Textbox(value=llm_settings.reranker_model, label="model"), - gr.Textbox(value=llm_settings.cohere_base_url, label="base_url"), + gr.Textbox( + value=llm_settings.cohere_base_url, label="base_url" + ), ] elif reranker_type == "siliconflow": with gr.Row(): reranker_config_input = [ - gr.Textbox(value=llm_settings.reranker_api_key, label="api_key", type="password"), + gr.Textbox( + value=llm_settings.reranker_api_key, + label="api_key", + type="password", + ), gr.Textbox( value="BAAI/bge-reranker-v2-m3", label="model", @@ -460,5 +693,29 @@ def create_configs_block() -> list: fn=apply_reranker_config, inputs=reranker_config_input, # pylint: disable=no-member ) + # The reason for returning this partial value is the functional need to refresh the ui return graph_config_input + + +def get_header_with_language_indicator(language: str) -> str: + language_class = language.lower() + + if language == "CN": + title_text = "当前prompt语言: 中文 (CN)" + else: + title_text = "Current prompt Language: English (EN)" + html_content = f""" + <div class="header-container"> + <h1 class="header-title">HugeGraph RAG Platform 🚀</h1> + <div class="language-indicator-container"> + <div class="language-indicator {language_class}"> + {language} + </div> + <div class="custom-tooltip"> + {title_text} + </div> + </div> + </div> + """ + return html_content diff --git a/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py b/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py index 1d5e8a94..9897f420 100644 --- a/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py +++ b/hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py @@ -34,16 +34,25 @@ from hugegraph_llm.utils.graph_index_utils import ( clean_all_graph_data, update_vid_embedding, extract_graph, - import_graph_data, build_schema, + import_graph_data, + build_schema, ) from hugegraph_llm.utils.hugegraph_utils import check_graph_db_connection from hugegraph_llm.utils.log import log -from hugegraph_llm.utils.vector_index_utils import clean_vector_index, build_vector_index, get_vector_index_info +from hugegraph_llm.utils.vector_index_utils import ( + clean_vector_index, + build_vector_index, + get_vector_index_info, +) def store_prompt(doc, schema, example_prompt): # update env variables: doc, schema and example_prompt - if prompt.doc_input_text != doc or prompt.graph_schema != schema or prompt.extract_graph_prompt != example_prompt: + if ( + prompt.doc_input_text != doc + or prompt.graph_schema != schema + or prompt.extract_graph_prompt != example_prompt + ): prompt.doc_input_text = doc prompt.graph_schema = schema prompt.extract_graph_prompt = example_prompt @@ -55,18 +64,22 @@ def generate_prompt_for_ui(source_text, scenario, example_name): Handles the UI logic for generating a new prompt. It calls the PromptGenerate operator. """ if not all([source_text, scenario, example_name]): - gr.Warning("Please provide original text, expected scenario, and select an example!") + gr.Warning( + "Please provide original text, expected scenario, and select an example!" + ) return gr.update() try: prompt_generator = PromptGenerate(llm=LLMs().get_chat_llm()) context = { "source_text": source_text, "scenario": scenario, - "example_name": example_name + "example_name": example_name, } result_context = prompt_generator.run(context) # Presents the result of generating prompt - generated_prompt = result_context.get("generated_extract_prompt", "Generation failed. Please check the logs.") + generated_prompt = result_context.get( + "generated_extract_prompt", "Generation failed. Please check the logs." + ) gr.Info("Prompt generated successfully!") return generated_prompt except Exception as e: @@ -77,51 +90,86 @@ def generate_prompt_for_ui(source_text, scenario, example_name): def load_example_names(): """Load all candidate examples""" try: - examples_path = os.path.join(resource_path, "prompt_examples", "prompt_examples.json") - with open(examples_path, 'r', encoding='utf-8') as f: + examples_path = os.path.join( + resource_path, "prompt_examples", "prompt_examples.json" + ) + with open(examples_path, "r", encoding="utf-8") as f: examples = json.load(f) return [example.get("name", "Unnamed example") for example in examples] except (FileNotFoundError, json.JSONDecodeError): return ["No available examples"] + def load_query_examples(): - """Load query examples from JSON file""" + """Load query examples from JSON file based on the prompt language setting""" try: - examples_path = os.path.join(resource_path, "prompt_examples", "query_examples.json") - with open(examples_path, 'r', encoding='utf-8') as f: + language = getattr( + prompt, + "language", + getattr(prompt.llm_settings, "language", "EN") + if hasattr(prompt, "llm_settings") + else "EN", + ) + if language.upper() == "CN": + examples_path = os.path.join( + resource_path, "prompt_examples", "query_examples_CN.json" + ) + else: + examples_path = os.path.join( + resource_path, "prompt_examples", "query_examples.json" + ) + + with open(examples_path, "r", encoding="utf-8") as f: examples = json.load(f) - return json.dumps(examples, indent=2) + return json.dumps(examples, indent=2, ensure_ascii=False) except (FileNotFoundError, json.JSONDecodeError): - return "[]" + try: + examples_path = os.path.join( + resource_path, "prompt_examples", "query_examples.json" + ) + with open(examples_path, "r", encoding="utf-8") as f: + examples = json.load(f) + return json.dumps(examples, indent=2, ensure_ascii=False) + except (FileNotFoundError, json.JSONDecodeError): + return "[]" + def load_schema_fewshot_examples(): """Load few-shot examples from a JSON file""" try: - examples_path = os.path.join(resource_path, "prompt_examples", "schema_examples.json") - with open(examples_path, 'r', encoding='utf-8') as f: + examples_path = os.path.join( + resource_path, "prompt_examples", "schema_examples.json" + ) + with open(examples_path, "r", encoding="utf-8") as f: examples = json.load(f) - return json.dumps(examples, indent=2) + return json.dumps(examples, indent=2, ensure_ascii=False) except (FileNotFoundError, json.JSONDecodeError): return "[]" + def update_example_preview(example_name): """Update the display content based on the selected example name.""" try: - examples_path = os.path.join(resource_path, "prompt_examples", "prompt_examples.json") - with open(examples_path, 'r', encoding='utf-8') as f: + examples_path = os.path.join( + resource_path, "prompt_examples", "prompt_examples.json" + ) + with open(examples_path, "r", encoding="utf-8") as f: all_examples = json.load(f) - selected_example = next((ex for ex in all_examples if ex.get("name") == example_name), None) + selected_example = next( + (ex for ex in all_examples if ex.get("name") == example_name), None + ) if selected_example: return ( - selected_example.get('description', ''), - selected_example.get('text', ''), - selected_example.get('prompt', ''), + selected_example.get("description", ""), + selected_example.get("text", ""), + selected_example.get("prompt", ""), ) except (FileNotFoundError, json.JSONDecodeError) as e: log.warning("Could not update example preview: %s", e) return "", "", "" + def _create_prompt_helper_block(demo, input_text, info_extract_template): with gr.Accordion("Graph Extraction Prompt Generator", open=False): gr.Markdown( @@ -131,32 +179,45 @@ def _create_prompt_helper_block(demo, input_text, info_extract_template): user_scenario_text = gr.Textbox( label="Expected scenario/direction", info="For example: social relationships, financial knowledge graphs, etc.", - lines=2 + lines=2, ) example_names = load_example_names() few_shot_dropdown = gr.Dropdown( choices=example_names, label="Select a Few-shot example as a reference", - value=example_names[0] if example_names and example_names[0] != "No available examples" else None + value=example_names[0] + if example_names and example_names[0] != "No available examples" + else None, ) with gr.Accordion("View example details", open=False): example_desc_preview = gr.Markdown(label="Example description") - example_text_preview = gr.Textbox(label="Example input text", lines=5, interactive=False) - example_prompt_preview = gr.Code(label="Example Graph Extract Prompt", language="markdown", - interactive=False) + example_text_preview = gr.Textbox( + label="Example input text", lines=5, interactive=False + ) + example_prompt_preview = gr.Code( + label="Example Graph Extract Prompt", + language="markdown", + interactive=False, + ) - generate_prompt_btn = gr.Button("🚀 Auto-generate Graph Extract Prompt", variant="primary") + generate_prompt_btn = gr.Button( + "🚀 Auto-generate Graph Extract Prompt", variant="primary" + ) # Bind the change event of the dropdown menu few_shot_dropdown.change( fn=update_example_preview, inputs=[few_shot_dropdown], - outputs=[example_desc_preview, example_text_preview, example_prompt_preview] + outputs=[ + example_desc_preview, + example_text_preview, + example_prompt_preview, + ], ) # Bind the click event of the generated button. generate_prompt_btn.click( fn=generate_prompt_for_ui, inputs=[input_text, user_scenario_text, few_shot_dropdown], - outputs=[info_extract_template] + outputs=[info_extract_template], ) # Preload the page on the first load. @@ -168,7 +229,11 @@ def _create_prompt_helper_block(demo, input_text, info_extract_template): demo.load( fn=warm_up_preview, inputs=[few_shot_dropdown], - outputs=[example_desc_preview, example_text_preview, example_prompt_preview] + outputs=[ + example_desc_preview, + example_text_preview, + example_prompt_preview, + ], ) @@ -179,12 +244,12 @@ def _build_schema_and_provide_feedback(input_text, query_example, few_shot): gr.Info("Schema generated successfully!") return generated_schema + def create_vector_graph_block(): # pylint: disable=no-member # pylint: disable=C0301 # pylint: disable=unexpected-keyword-arg with gr.Blocks() as demo: - gr.Markdown( """## Build Vector/Graph Index & Extract Knowledge Graph - Docs: @@ -207,7 +272,7 @@ def create_vector_graph_block(): value=prompt.doc_input_text, label="Input Doc(s)", lines=20, - show_copy_button=True + show_copy_button=True, ) with gr.Tab("file") as tab_upload_file: input_file = gr.File( @@ -215,13 +280,23 @@ def create_vector_graph_block(): label="Docs (multi-files can be selected together)", file_count="multiple", ) - input_schema = gr.Code(value=prompt.graph_schema, label="Graph Schema", language="json", lines=15, - max_lines=29) + input_schema = gr.Code( + value=prompt.graph_schema, + label="Graph Schema", + language="json", + lines=15, + max_lines=29, + ) info_extract_template = gr.Code( - value=prompt.extract_graph_prompt, label="Graph Extract Prompt Header", language="markdown", lines=15, - max_lines=29 + value=prompt.extract_graph_prompt, + label="Graph Extract Prompt Header", + language="markdown", + lines=15, + max_lines=29, + ) + out = gr.Code( + label="Output Info", language="json", elem_classes="code-container-edit" ) - out = gr.Code(label="Output Info", language="json", elem_classes="code-container-edit") with gr.Row(): with gr.Accordion("Get RAG Info", open=False): @@ -230,8 +305,12 @@ def create_vector_graph_block(): graph_index_btn0 = gr.Button("Get Graph Index Info", size="sm") with gr.Accordion("Clear RAG Data", open=False): with gr.Column(): - vector_index_btn1 = gr.Button("Clear Chunks Vector Index", size="sm") - graph_index_btn1 = gr.Button("Clear Graph Vid Vector Index", size="sm") + vector_index_btn1 = gr.Button( + "Clear Chunks Vector Index", size="sm" + ) + graph_index_btn1 = gr.Button( + "Clear Graph Vid Vector Index", size="sm" + ) graph_data_btn0 = gr.Button("Clear Graph Data", size="sm") vector_import_bt = gr.Button("Import into Vector", variant="primary") @@ -251,14 +330,14 @@ def create_vector_graph_block(): label="Query Examples", language="json", lines=10, - max_lines=15 + max_lines=15, ) few_shot = gr.Code( value=load_schema_fewshot_examples(), label="Few-shot Example", language="json", lines=10, - max_lines=15 + max_lines=15, ) build_schema_bt = gr.Button("Generate Schema", variant="primary") _create_prompt_helper_block(demo, input_text, info_extract_template) @@ -271,7 +350,9 @@ def create_vector_graph_block(): store_prompt, inputs=[input_text, input_schema, info_extract_template], ) - vector_import_bt.click(build_vector_index, inputs=[input_file, input_text], outputs=out).then( + vector_import_bt.click( + build_vector_index, inputs=[input_file, input_text], outputs=out + ).then( store_prompt, inputs=[input_text, input_schema, info_extract_template], ) @@ -294,11 +375,17 @@ def create_vector_graph_block(): # origin_out = gr.Textbox(visible=False) graph_extract_bt.click( - extract_graph, inputs=[input_file, input_text, input_schema, info_extract_template], outputs=[out] - ).then(store_prompt, inputs=[input_text, input_schema, info_extract_template], ) + extract_graph, + inputs=[input_file, input_text, input_schema, info_extract_template], + outputs=[out], + ).then( + store_prompt, + inputs=[input_text, input_schema, info_extract_template], + ) - graph_loading_bt.click(import_graph_data, inputs=[out, input_schema], outputs=[out]).then( - update_vid_embedding).then( + graph_loading_bt.click( + import_graph_data, inputs=[out, input_schema], outputs=[out] + ).then(update_vid_embedding).then( store_prompt, inputs=[input_text, input_schema, info_extract_template], ) @@ -307,10 +394,14 @@ def create_vector_graph_block(): build_schema_bt.click( _build_schema_and_provide_feedback, inputs=[input_text, query_example, few_shot], - outputs=[input_schema] + outputs=[input_schema], ).then( store_prompt, - inputs=[input_text, input_schema, info_extract_template], # TODO: Store the updated examples + inputs=[ + input_text, + input_schema, + info_extract_template, + ], # TODO: Store the updated examples ) def on_tab_select(input_f, input_t, evt: gr.SelectData): @@ -321,8 +412,16 @@ def create_vector_graph_block(): return [], input_t return [], "" - tab_upload_file.select(fn=on_tab_select, inputs=[input_file, input_text], outputs=[input_file, input_text]) - tab_upload_text.select(fn=on_tab_select, inputs=[input_file, input_text], outputs=[input_file, input_text]) + tab_upload_file.select( + fn=on_tab_select, + inputs=[input_file, input_text], + outputs=[input_file, input_text], + ) + tab_upload_text.select( + fn=on_tab_select, + inputs=[input_file, input_text], + outputs=[input_file, input_text], + ) return input_text, input_schema, info_extract_template @@ -342,14 +441,16 @@ async def timely_update_vid_embedding(interval_seconds: int = 3600): "name": huge_settings.graph_name, "user": huge_settings.graph_user, "pwd": huge_settings.graph_pwd, - "graph_space": huge_settings.graph_space + "graph_space": huge_settings.graph_space, } if check_graph_db_connection(**config): await asyncio.to_thread(update_vid_embedding) log.info("update_vid_embedding executed successfully") else: - log.warning("HugeGraph server connection failed, so skipping update_vid_embedding, " - "please check graph configuration and connectivity") + log.warning( + "HugeGraph server connection failed, so skipping update_vid_embedding, " + "please check graph configuration and connectivity" + ) except asyncio.CancelledError as ce: log.info("Periodic task has been cancelled due to: %s", ce) break diff --git a/hugegraph-llm/src/hugegraph_llm/operators/llm_op/prompt_generate.py b/hugegraph-llm/src/hugegraph_llm/operators/llm_op/prompt_generate.py index a7ea1e3d..82326f00 100644 --- a/hugegraph-llm/src/hugegraph_llm/operators/llm_op/prompt_generate.py +++ b/hugegraph-llm/src/hugegraph_llm/operators/llm_op/prompt_generate.py @@ -55,7 +55,8 @@ class PromptGenerate: few_shot_text=few_shot_example.get('text', ''), few_shot_prompt=few_shot_example.get('prompt', ''), user_text=source_text, - user_scenario=scenario + user_scenario=scenario, + language=prompt_tpl.llm_settings.language ) log.debug("Meta-prompt sent to LLM: %s", meta_prompt) generated_prompt = self.llm.generate(prompt=meta_prompt) diff --git a/hugegraph-llm/src/hugegraph_llm/resources/demo/css.py b/hugegraph-llm/src/hugegraph_llm/resources/demo/css.py index 0f2d48d5..f32f3ffc 100644 --- a/hugegraph-llm/src/hugegraph_llm/resources/demo/css.py +++ b/hugegraph-llm/src/hugegraph_llm/resources/demo/css.py @@ -58,4 +58,113 @@ footer { font-weight: bold; margin-bottom: -5px; } + +/* Language Indicator Styles */ +.header-container { + display: flex; + justify-content: space-between; + align-items: center; + margin-bottom: 20px; + padding: 0 10px; +} + +.header-title { + margin: 0; + padding: 0; + font-size: 32px; + font-weight: 600; +} + +.language-indicator { + display: inline-flex; + align-items: center; + justify-content: center; + min-width: 24px; + height: 24px; + padding: 2px 6px; + border-radius: 12px; + font-size: 11px; + font-weight: 700; + font-family: 'SF Pro Display', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; + text-transform: uppercase; + letter-spacing: 0.5px; + cursor: default; + transition: all 0.2s ease; + box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1); + border: 1px solid rgba(255, 255, 255, 0.2); + backdrop-filter: blur(10px); + -webkit-backdrop-filter: blur(10px); +} + +.language-indicator.en { + background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); + color: white; + text-shadow: 0 1px 2px rgba(0, 0, 0, 0.2); +} + +.language-indicator.cn { + background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); + color: white; + text-shadow: 0 1px 2px rgba(0, 0, 0, 0.2); +} + +.language-indicator:hover { + transform: translateY(-1px); + box-shadow: 0 2px 8px rgba(0, 0, 0, 0.15); +} + +/* Custom Tooltip Styles */ +.language-indicator-container { + position: relative; + display: inline-block; +} + +.custom-tooltip { + position: absolute; + top: 50%; + right: 100%; + transform: translateY(-50%); + background: rgba(0, 0, 0, 0.8); + color: white; + padding: 4px 8px; + border-radius: 3px; + font-size: 11px; + white-space: nowrap; + z-index: 1000; + opacity: 0; + visibility: hidden; + transition: opacity 0.2s ease; + margin-right: 8px; + box-shadow: 0 1px 4px rgba(0, 0, 0, 0.2); +} + +.language-indicator-container:hover .custom-tooltip { + opacity: 1; + visibility: visible; +} + +.custom-tooltip::before { + content: ''; + position: absolute; + top: 50%; + left: 100%; + transform: translateY(-50%); + border-top: 3px solid transparent; + border-bottom: 3px solid transparent; + border-left: 3px solid rgba(0, 0, 0, 0.8); +} + +/* Responsive adjustments */ +@media (max-width: 768px) { + .header-container { + flex-direction: column; + align-items: flex-start; + gap: 10px; + } + + .language-indicator { + align-self: flex-end; + margin-top: -10px; + } +} """ diff --git a/hugegraph-llm/src/hugegraph_llm/resources/prompt_examples/query_examples_CN.json b/hugegraph-llm/src/hugegraph_llm/resources/prompt_examples/query_examples_CN.json new file mode 100644 index 00000000..e21d35e6 --- /dev/null +++ b/hugegraph-llm/src/hugegraph_llm/resources/prompt_examples/query_examples_CN.json @@ -0,0 +1,9 @@ +[ + "属性过滤:查找所有年龄大于 30 的'person'节点,并返回他们的姓名和职业", + "关系遍历:查找名为 Alice 的所有人的室友,并返回他们的姓名和年龄", + "最短路径:查找 Bob 和 Charlie 之间的最短路径,并显示路径上的边标签", + "子图匹配:查找所有关注同一网页的朋友对,并返回他们的姓名和 URL", + "聚合统计:统计每种职业的人数,并计算他们的平均年龄", + "时间过滤:查找所有在 2025-01-01 之后创建的节点,并返回它们的名称和创建时间", + "Top-N 查询:列出访问量最多的前 10 个网页,包含其 URL 和访问次数" +] diff --git a/style/pylint.conf b/style/pylint.conf index f23b87f9..6ccb7a07 100644 --- a/style/pylint.conf +++ b/style/pylint.conf @@ -466,8 +466,9 @@ disable=raw-checker-failed, W0622, # Redefining built-in 'id' (redefined-builtin) R0904, # Too many public methods (27/20) (too-many-public-methods) E1120, # TODO: unbound-method-call-no-value-for-parameter - R0917, # Too many positional arguments (6/5) (too-many-positional-arguments) + # R0917, # Too many positional arguments (6/5) (too-many-positional-arguments) - Not available in older pylint versions C0103, + E1101, # no-member - Dynamic attributes set via setattr() in BasePromptConfig # Enable the message, report, category or checker with the given id(s). You can # either give multiple identifier separated by comma (,) or put this option