This is an automated email from the ASF dual-hosted git repository.

jin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-ai.git


The following commit(s) were added to refs/heads/main by this push:
     new 8c1ffbb  refactor(llm): improve graph extraction default prompt (#187)
8c1ffbb is described below

commit 8c1ffbb7373ab7a3ffac5f1ec99788c5a0f146ce
Author: Kryst4lDem0ni4s <126225558+kryst4ldem0n...@users.noreply.github.com>
AuthorDate: Mon Mar 3 13:49:45 2025 +0530

    refactor(llm): improve graph extraction default prompt (#187)
    
    TODO:
    sync CN version
    
    ---------
    
    Co-authored-by: imbajin <j...@apache.org>
---
 .../src/hugegraph_llm/config/prompt_config.py      | 62 +++++++++++++++-------
 1 file changed, 42 insertions(+), 20 deletions(-)

diff --git a/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py 
b/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py
index cdab6be..ad32bbc 100644
--- a/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py
+++ b/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py
@@ -40,42 +40,64 @@ Answer:
 
     default_question: str = """Tell me about Sarah."""
 
-    # Data is detached from 
hugegraph-llm/src/hugegraph_llm/operators/llm_op/property_graph_extract.py
+    # Note: Users should modify the prompt(examples) according to the real 
schema and text (property_graph_extract.py)
     extract_graph_prompt: str = """## Main Task
 Given the following graph schema and a piece of text, your task is to analyze 
the text and extract information that fits into the schema's structure, 
formatting the information into vertices and edges as specified.
 
-## Basic Rules
-### Schema Format
+## Basic Rules:
+### Schema Format:
 Graph Schema:
-- Vertices: [List of vertex labels and their properties]
-- Edges: [List of edge labels, their source and target vertex labels, and 
properties]
+- "vertices": [List of vertex labels and their properties]
+- "edges": [List of edge labels, their source and target vertex labels, and 
properties]
 
-### Content Rule
-Please read the provided text carefully and identify any information that 
corresponds to the vertices and edges defined in the schema. For each piece of 
information that matches a vertex or edge, format it according to the following 
JSON structures:
+### Content Rule:
+Please read the provided text carefully and identify any information that 
corresponds to the vertices and edges defined in the schema. 
+You are not allowed to modify the schema contraints. Your task is to format 
the provided information into the required schema, without missing any keyword.
+For each piece of information that matches a vertex or edge, format it 
strictly according to the following JSON structures:
 
 #### Vertex Format:
 
{"id":"vertexLabelID:entityName","label":"vertexLabel","type":"vertex","properties":{"propertyName":"propertyValue",
 ...}}
 
+where:
+    - "vertexLabelID": int
+    - "vertexLabel": str
+    - "entityName": str
+    - "type": "vertex"
+    - "properties": dict
+
 #### Edge Format:
-{"label":"edgeLabel","type":"edge","outV":"sourceVertexId","outVLabel":"sourceVertexLabel","inV":"targetVertexId","inVLabel":"targetVertexLabel","properties":{"propertyName":"propertyValue",...}}
-Also follow the rules: 
-1. Don't extract property fields or labels that doesn't exist in the given 
schema 
-2. Ensure the extracted property set in the same type as the given schema 
(like 'age' should be a number, 'select' should be a boolean)
-3. If there are multiple primary keys, the strategy for generating VID is: 
vertexlabelID:pk1!pk2!pk3 (pk means primary key, and '!' is the separator)
-4. Output in JSON format, only include vertexes and edges & remove empty 
properties, extracted and formatted based on the text/rules and schema
-5. Translate the schema fields into Chinese if the given text is Chinese but 
the schema is in English (Optional)
-
-## Example
+{"id":"vertexlabelID:pk1!pk2!pk3", 
label":"edgeLabel","type":"edge","outV":"sourceVertexId","outVLabel":"sourceVertexLabel","inV":"targetVertexId","inVLabel":"targetVertexLabel","properties":{"propertyName":"propertyValue",...}}
+
+where:
+    - "id": int or str (conditional) (optional)
+    - "edgeLabel": str
+    - "type": "edge"
+    - "outV": str
+    - "outVLabel": str
+    - "inV": str
+    - "inVLabel": str
+    - "properties": dict
+    - "sourceVertexId": "vertexLabelID:entityName"
+    - "targetVertexId": "vertexLabelID:entityName"
+
+Strictly follow these rules: 
+1. Don't extract property fields or labels that doesn't exist in the given 
schema. Do not generate new information.
+2. Ensure the extracted property set in the same type as the given schema 
(like 'age' should be a number, 'select' should be a boolean).
+3. If there are multiple primary keys, the strategy for generating VID is: 
vertexlabelID:pk1!pk2!pk3 (pk means primary key, and '!' is the separator). 
This id must be generated ONLY if there are multiple primary keys. If there is 
only one primary key, the strategy for generating VID is: int (sequencially 
increasing).
+4. Output in JSON format, only include vertexes and edges & remove empty 
properties, extracted and formatted based on the text/rules and schema.
+5. Translate the schema fields into Chinese if the given text input is Chinese 
(Optional)
+
+Refer to the following baseline example to understand the output generation 
requirements:
+## Example:
 ### Input example:
-#### text
+#### text:
 Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared 
a home with since 2010. James, in his professional life, works as a journalist. 
 
 
-#### graph schema
+#### graph schema example:
 
{"vertices":[{"vertex_label":"person","properties":["name","age","occupation"]}],
 "edges":[{"edge_label":"roommate", 
"source_vertex_label":"person","target_vertex_label":"person","properties":["date"]]}
 
 ### Output example:
-[{"id":"1:Sarah","label":"person","type":"vertex","properties":{"name":"Sarah","age":30,"occupation":"attorney"}},{"id":"1:James","label":"person","type":"vertex","properties":{"name":"James","occupation":"journalist"}},{"label":"roommate","type":"edge","outV":"1:Sarah","outVLabel":"person","inV":"1:James","inVLabel":"person","properties":{"date":"2010"}}]
-"""
+{"vertices":[{"id":"1:Sarah","label":"person","type":"vertex","properties":{"name":"Sarah","age":30,"occupation":"attorney"}},{"id":"1:James","label":"person","type":"vertex","properties":{"name":"James","occupation":"journalist"}}],
 "edges":[{"id": 1, 
"label":"roommate","type":"edge","outV":"1:Sarah","outVLabel":"person","inV":"1:James","inVLabel":"person","properties":{"date":"2010"}}]}"""
 
     graph_schema: str = """{
 "vertexlabels": [

Reply via email to