LRriver opened a new issue, #348: URL: https://github.com/apache/hugegraph-ai/issues/348
### Search before asking - [x] I had searched in the [feature](https://github.com/apache/hugegraph-ai/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Feature Description (功能描述) ## Feature Description HugeGraph-LLM has a graph extraction flow and the Gradio demo can call it, but there is no public REST endpoint for graph extraction. Please add a FastAPI endpoint that exposes graph extraction through the existing scheduler/flow boundary. This gives users a programmatic way to extract vertices and edges without using the demo UI. ## Current verification - The API router currently exposes `/rag`, `/rag/graph`, `/config/*`, and `/text2gremlin` in `hugegraph-llm/src/hugegraph_llm/api/rag_api.py`. - The FastAPI app registers `rag_http_api(...)` and `admin_http_api(...)`, but no graph extraction API is registered in `hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py`. - The graph extraction flow already exists as `FlowName.GRAPH_EXTRACT` and is registered in `Scheduler` in `hugegraph-llm/src/hugegraph_llm/flows/scheduler.py`. - The demo helper calls the flow through `SchedulerSingleton`, so the endpoint should reuse that path rather than directly instantiating low-level operators. ## Suggested endpoint `POST /graph/extract` Suggested request fields: - `texts`: string or list of strings. - `schema`: graph schema JSON string or object, matching the existing flow expectations. - `example_prompt`: optional graph extraction prompt header. - `extract_type`: default `property_graph`. - `language`: default `zh`. - `split_type`: default `document`; valid values should match `ChunkSplit`. - `include_meta`: optional flag for chunk count, call count, and warnings. Suggested response fields: - `vertices` - `edges` - `warning`, when extraction returns no graph data or partial errors occur. - `meta`, when requested. ## Mermaid reference ```mermaid sequenceDiagram participant Client participant API as FastAPI /graph/extract participant Scheduler as SchedulerSingleton participant Flow as GraphExtractFlow participant Nodes as Schema + ChunkSplit + Extract nodes Client->>API: POST texts, schema, prompt, split_type API->>API: validate request API->>Scheduler: schedule_flow(FlowName.GRAPH_EXTRACT, ...) Scheduler->>Flow: prepare/build pipeline Flow->>Nodes: run graph extraction Nodes-->>Flow: vertices, edges, metadata Flow-->>Scheduler: post_deal result Scheduler-->>API: extraction result API-->>Client: JSON response ``` ## Acceptance criteria - `POST /graph/extract` is available from the FastAPI app. - The endpoint uses `SchedulerSingleton` and `FlowName.GRAPH_EXTRACT`. - Request validation rejects empty text and invalid schema with 4xx errors. - If `schema` is accepted as an object, the API layer normalizes it to the JSON string shape expected by the current `SchemaNode`. - The response returns structured JSON, not a JSON-encoded string inside a string. - Existing demo graph extraction behavior remains unchanged. ## Suggested tests - Pydantic request model tests for valid and invalid inputs. - API test with a mocked scheduler/flow result. - Contract test that the endpoint returns `vertices` and `edges` as arrays. - Regression test that existing `/rag` and `/text2gremlin` endpoints still register. ## Dependencies - This can be implemented independently, but it should share the `split_type` contract from `01-configurable-graph-extract-chunk-split.md` if that task has landed. ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
