Re: [PR] refactor(llm): Tag configuration to be confirmed and collated [incubator-hugegraph-ai]

via GitHub Sat, 07 Sep 2024 22:59:06 -0700


ChenZiHong-Gavin commented on code in PR #77:
URL: 
https://github.com/apache/incubator-hugegraph-ai/pull/77#discussion_r1749090090



##########
hugegraph-llm/src/hugegraph_llm/config/config_prompt.yaml:
##########
@@ -0,0 +1,118 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+rag_schema: |
+  {
+  "vertexlabels": [
+      {
+      "id": 1,
+      "name": "person",
+      "id_strategy": "PRIMARY_KEY",
+      "primary_keys": [
+          "name"
+      ],
+      "properties": [
+          "name",
+          "age",
+          "occupation"
+      ]
+      },
+      {
+      "id": 2,
+      "name": "webpage",
+      "id_strategy": "PRIMARY_KEY",
+      "primary_keys": [
+          "name"
+      ],
+      "properties": [
+          "name",
+          "url"
+      ]
+      }
+  ],
+  "edgelabels": [
+      {
+      "id": 1,
+      "name": "roommate",
+      "source_label": "person",
+      "target_label": "person",
+      "properties": [
+          "date"
+      ]
+      },
+      {
+      "id": 2,
+      "name": "link",
+      "source_label": "webpage",
+      "target_label": "person",
+      "properties": []
+      }
+  ]
+  }
+
+schema_example_prompt: |
+    ## Main Task
+    Given the following graph schema and a piece of text, your task is to 
analyze the text and extract information that fits into the schema's structure, 
formatting the information into vertices and edges as specified.
+    
+    ## Basic Rules
+    ### Schema Format
+    Graph Schema:
+    - Vertices: [List of vertex labels and their properties]
+    - Edges: [List of edge labels, their source and target vertex labels, and 
properties]
+    
+    ### Content Rule
+    Please read the provided text carefully and identify any information that 
corresponds to the vertices and edges defined in the schema. For each piece of 
information that matches a vertex or edge, format it according to the following 
JSON structures:
+    #### Vertex Format:
+    
{"id":"vertexLabelID:entityName","label":"vertexLabel","type":"vertex","properties":{"propertyName":"propertyValue",
 ...}}
+    
+    #### Edge Format:
+    
{"label":"edgeLabel","type":"edge","outV":"sourceVertexId","outVLabel":"sourceVertexLabel","inV":"targetVertexId","inVLabel":"targetVertexLabel","properties":{"propertyName":"propertyValue",...}}
+    
+    Also follow the rules: 
+    1. Don't extract property fields that do not exist in the given schema
+    2. Ensure the extracted property is in the same type as the schema (like 
'age' should be a number)
+    3. If there are multiple primary keys, the strategy for generating VID is: 
vertexlabelID:pk1!pk2!pk3 (pk means primary key, and '!' is the separator)
+    4. Output should be a list of JSON objects, each representing a vertex or 
an edge, extracted and formatted based on the text and schema.
+    5. Translate the schema fields into Chinese if the given text is Chinese 
but the schema is in English (Optional)
+    
+    ## Example
+    ### Input example:
+    #### text
+    Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's 
shared a home with since 2010. James, in his professional life, works as a 
journalist.  
+    
+    #### graph schema
+    
{"vertices":[{"vertex_label":"person","properties":["name","age","occupation"]}],
 "edges":[{"edge_label":"roommate", 
"source_vertex_label":"person","target_vertex_label":"person","properties":["date"]]}
+    
+    ### Output example:
+    
[{"id":"1:Sarah","label":"person","type":"vertex","properties":{"name":"Sarah","age":30,"occupation":"attorney"}},{"id":"1:James","label":"person","type":"vertex","properties":{"name":"James","occupation":"journalist"}},{"label":"roommate","type":"edge","outV":"1:Sarah","outVLabel":"person","inV":"1:James","inVLabel":"person","properties":{"date":"2010"}}]
+
+docs_build_rag: |
+    ## 1. Build vector/graph RAG (💡)
+    - Doc(s): Upload document file(s) which should be TXT or DOCX. (Multiple 
files can be selected together)
+    - Schema: Accepts two types of text as below:
+        - User-defined JSON format Schema.
+        - Specify the name of the HugeGraph graph instance, it will 
automatically get the schema from it.
+    - Info extract head: The head of prompt of info extracting.
+    - Build mode: 
+        - Test Mode: Only extract vertices and edges from the file into memory 
(without building the vector index or 
+        writing data into HugeGraph)
+        - Import Mode: Extract the data and append it to HugeGraph & the 
vector index (without clearing any existing data)
+        - Clear and Import: Clear all existed RAG data(vector + graph), then 
rebuild them from the current input
+        - Rebuild Vector: Only rebuild vector index. (keep the graph data 
intact)
+        
+        

Review Comment:
   again, a new line is needed



##########
hugegraph-llm/src/hugegraph_llm/config/config.py:
##########
@@ -17,64 +17,24 @@
 
 
 import os
+import yaml
 
 from dataclasses import dataclass
 from typing import Literal, Optional
 from dotenv import dotenv_values, set_key
 
 from hugegraph_llm.utils.log import log
+from hugegraph_llm.config.config_data import ConfigData, PromptData
 
 dirname = os.path.dirname
 package_path = dirname(dirname(dirname(dirname(os.path.abspath(__file__)))))
 env_path = os.path.join(package_path, ".env")
+yaml_file_path = os.path.join(package_path, 
"src/hugegraph_llm/config/config_prompt.yaml")
 
+# TODO: We need to tidy up the partition settings

Review Comment:
   remove this TODO if it is down?



##########
hugegraph-llm/src/hugegraph_llm/demo/rag_web_demo.py:
##########
@@ -542,15 +509,15 @@ def reranker_settings(reranker_type):
         gr.Markdown("""## 2. RAG with HugeGraph 📖""")
         with gr.Row():
             with gr.Column(scale=2):
-                inp = gr.Textbox(value="Tell me about Sarah.", 
label="Question", show_copy_button=True)
+                inp = gr.Textbox(value=settings.question, label="Question", 
show_copy_button=True)

Review Comment:
   Perhaps we need to standardize the input format here?
   If `lines=2` is set to `answer_prompt_input`, `question` also needs it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] refactor(llm): Tag configuration to be confirmed and collated [incubator-hugegraph-ai]

Reply via email to