linmengmeng-1314 opened a new pull request, #332:
URL: https://github.com/apache/hugegraph-ai/pull/332
## Summary
- Improve `_extract_and_filter_label` to handle varying LLM output formats
- Strip markdown code blocks before JSON extraction
- Support both `{"vertices":[...], "edges":[...]}` (object) and flat array
formats
- Auto-convert flat arrays to the expected object structure
## Problem
When using reasoning models (e.g., DeepSeek V4) for graph extraction, the
LLM may return:
1. JSON wrapped in markdown code blocks (`\`\`\`json ... \`\`\``), which
breaks the greedy regex `({.*})`
2. A flat array `[vertex, edge, ...]` instead of the expected object
`{"vertices": [...], "edges": [...]}`
Both cases cause `json.JSONDecodeError` and result in empty extraction
output even though the LLM correctly identified entities and relationships.
## Solution
- Strip markdown code fences (`\`\`\`json` / `\`\`\``) before regex matching
- Update regex to match both objects (`{...}`) and arrays (`[...]`)
- When a flat array is detected, partition items by `type` field into
`vertices` and `edges`
## Test plan
- [ ] Test with OpenAI models (existing behavior should be preserved)
- [ ] Test with DeepSeek models (markdown-wrapped array format)
- [ ] Test with Ollama models
- [ ] Verify both object and array formats are handled correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]