[PR] fix(graph): improve property graph JSON parsing robustness for LLM outputs [hugegraph-ai]

via GitHub Mon, 18 May 2026 04:57:48 -0700


linmengmeng-1314 opened a new pull request, #332:
URL: https://github.com/apache/hugegraph-ai/pull/332


   ## Summary
   
   - Improve `_extract_and_filter_label` to handle varying LLM output formats
   - Strip markdown code blocks before JSON extraction
   - Support both `{"vertices":[...], "edges":[...]}` (object) and flat array 
formats
   - Auto-convert flat arrays to the expected object structure
   
   ## Problem
   
   When using reasoning models (e.g., DeepSeek V4) for graph extraction, the 
LLM may return:
   1. JSON wrapped in markdown code blocks (`\`\`\`json ... \`\`\``), which 
breaks the greedy regex `({.*})` 
   2. A flat array `[vertex, edge, ...]` instead of the expected object 
`{"vertices": [...], "edges": [...]}`
   
   Both cases cause `json.JSONDecodeError` and result in empty extraction 
output even though the LLM correctly identified entities and relationships.
   
   ## Solution
   
   - Strip markdown code fences (`\`\`\`json` / `\`\`\``) before regex matching
   - Update regex to match both objects (`{...}`) and arrays (`[...]`)
   - When a flat array is detected, partition items by `type` field into 
`vertices` and `edges`
   
   ## Test plan
   
   - [ ] Test with OpenAI models (existing behavior should be preserved)
   - [ ] Test with DeepSeek models (markdown-wrapped array format)
   - [ ] Test with Ollama models
   - [ ] Verify both object and array formats are handled correctly
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix(graph): improve property graph JSON parsing robustness for LLM outputs [hugegraph-ai]

Reply via email to