LRriver opened a new issue, #343: URL: https://github.com/apache/hugegraph-ai/issues/343
### Search before asking - [x] I had searched in the [feature](https://github.com/apache/hugegraph-ai/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Feature Description (功能描述) ## Feature Description Graph extraction already has a `ChunkSplitNode`, but the default graph extraction flow forces `split_type = "document"`, so extraction effectively runs on the whole document as a single chunk. Please expose a configurable split strategy for graph extraction while keeping the current behavior as the default. This should help users extract knowledge graphs from longer documents without sending the entire text to the LLM in one request. ## Current verification - `GraphExtractFlow.prepare()` hardcodes `prepared_input.split_type = "document"` in `hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py:46`. - The flow registers `ChunkSplitNode` in `hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py:61`, but `document` split is identity behavior. - `ChunkSplit` supports `document`, `paragraph`, and `sentence` in `hugegraph-llm/src/hugegraph_llm/operators/document_op/chunk_split.py`. - The demo helper `extract_graph()` calls `SchedulerSingleton.schedule_flow(FlowName.GRAPH_EXTRACT, schema, texts, example_prompt, "property_graph")` and does not pass any split option. ## Suggested scope - Add a `split_type` parameter to `GraphExtractFlow.prepare()` and `build_flow()`. - Default `split_type` to `document` to preserve compatibility. - Pass `split_type` from the demo graph extraction controls. - Keep accepted values aligned with the existing `ChunkSplit` operator: `document`, `paragraph`, `sentence`. - Return or expose enough chunk metadata to make manual debugging possible, at least `chunk_count` in a debug/meta path. ## Mermaid reference ```mermaid flowchart LR UI[Demo graph extraction UI] -->|split_type| Utils[extract_graph helper] Utils --> Scheduler[SchedulerSingleton] Scheduler --> Flow[GraphExtractFlow.prepare] Flow --> Split[ChunkSplitNode] Split -->|chunks| Extract[ExtractNode / PropertyGraphExtract] Extract --> Result[vertices and edges] ``` ## Acceptance criteria - Users can choose graph extraction split type in the demo. - Existing callers that do not pass `split_type` still behave as before. - `paragraph` and `sentence` split types produce multiple chunks for suitable input. - Invalid split types fail with a clear error. - The extraction result exposes `chunk_count` when debug/meta output is requested, or logs it in a clearly testable path. ## Suggested tests - Unit or flow-level test showing `GraphExtractFlow.prepare()` preserves default `document`. - Flow or node test showing a non-default split type reaches `ChunkSplitNode`. - Demo helper test or narrowly scoped API-level test for passing the selected value. ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
