[I] [Feature] Make graph extraction use configurable chunk splitting [hugegraph-ai]

via GitHub Mon, 25 May 2026 20:43:11 -0700


LRriver opened a new issue, #343:
URL: https://github.com/apache/hugegraph-ai/issues/343


   ### Search before asking
   
   - [x] I had searched in the 
[feature](https://github.com/apache/hugegraph-ai/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Feature Description (功能描述)
   
   ## Feature Description
   
   Graph extraction already has a `ChunkSplitNode`, but the default graph 
extraction
   flow forces `split_type = "document"`, so extraction effectively runs on the
   whole document as a single chunk. Please expose a configurable split strategy
   for graph extraction while keeping the current behavior as the default.
   
   This should help users extract knowledge graphs from longer documents without
   sending the entire text to the LLM in one request.
   
   ## Current verification
   
   - `GraphExtractFlow.prepare()` hardcodes `prepared_input.split_type = 
"document"` in `hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py:46`.
   - The flow registers `ChunkSplitNode` in 
`hugegraph-llm/src/hugegraph_llm/flows/graph_extract.py:61`, but `document` 
split is identity behavior.
   - `ChunkSplit` supports `document`, `paragraph`, and `sentence` in 
`hugegraph-llm/src/hugegraph_llm/operators/document_op/chunk_split.py`.
   - The demo helper `extract_graph()` calls 
`SchedulerSingleton.schedule_flow(FlowName.GRAPH_EXTRACT, schema, texts, 
example_prompt, "property_graph")` and does not pass any split option.
   
   ## Suggested scope
   
   - Add a `split_type` parameter to `GraphExtractFlow.prepare()` and 
`build_flow()`.
   - Default `split_type` to `document` to preserve compatibility.
   - Pass `split_type` from the demo graph extraction controls.
   - Keep accepted values aligned with the existing `ChunkSplit` operator: 
`document`, `paragraph`, `sentence`.
   - Return or expose enough chunk metadata to make manual debugging possible, 
at least `chunk_count` in a debug/meta path.
   
   ## Mermaid reference
   
   ```mermaid
   flowchart LR
       UI[Demo graph extraction UI] -->|split_type| Utils[extract_graph helper]
       Utils --> Scheduler[SchedulerSingleton]
       Scheduler --> Flow[GraphExtractFlow.prepare]
       Flow --> Split[ChunkSplitNode]
       Split -->|chunks| Extract[ExtractNode / PropertyGraphExtract]
       Extract --> Result[vertices and edges]
   ```
   
   ## Acceptance criteria
   
   - Users can choose graph extraction split type in the demo.
   - Existing callers that do not pass `split_type` still behave as before.
   - `paragraph` and `sentence` split types produce multiple chunks for 
suitable input.
   - Invalid split types fail with a clear error.
   - The extraction result exposes `chunk_count` when debug/meta output is 
requested, or logs it in a clearly testable path.
   
   ## Suggested tests
   
   - Unit or flow-level test showing `GraphExtractFlow.prepare()` preserves 
default `document`.
   - Flow or node test showing a non-default split type reaches 
`ChunkSplitNode`.
   - Demo helper test or narrowly scoped API-level test for passing the 
selected value.
   
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature] Make graph extraction use configurable chunk splitting [hugegraph-ai]

Reply via email to