Copilot commented on code in PR #424: URL: https://github.com/apache/incubator-hugegraph-doc/pull/424#discussion_r2480915486
########## content/en/blog/hugegraph-ai/agentic_graphrag.md: ########## @@ -0,0 +1,452 @@ +--- +date: 2025-10-29 +title: "Agentic GraphRAG" +linkTitle: "Agentic GraphRAG" +--- + +# Project Background + +To address the problem of temporal discrepancies between model training data and real-world data, Retrieval-Augmented Generation (RAG) technology has emerged. RAG, as the name suggests, is a technique that retrieves relevant data from external data sources (Retrieval) to augment (Argument) the quality of the answers generated (Generation) by large language models. + +The earliest RAG employed a simple Retrieval-Generation architecture. We take the user's question, perform some pre-processing (keyword extraction, etc.), obtain the pre-processed question, and then use an Embedding Model to grab relevant information from a vast amount of data as a Prompt, which is then fed to the large language model to enhance the quality of its responses. + +However, relying solely on semantic similarity matching to retrieve relevant information may not handle all situations, as the information that can enhance answer quality may not always be semantically similar to the question itself. A common example is: "Tell me the ontological view of the disciple of the philosopher who proposed that water is the origin of all things." Our data may not directly contain the answer to this question. The knowledge base might contain: + +1. Thales proposed that water is the origin of all things. +2. Anaximander was a disciple of Thales. +3. Anaximander identified the Apeiron, which has no formal definition, as the origin of all things. + +If we rely solely on semantic similarity matching, we are likely to only retrieve the first sentence to augment the large language model's answer. However, without information from sentences 2 and 3, and if the large language model lacks philosophy-related knowledge in its training dxata, it will be unable to correctly answer the question and might even "hallucinate." Review Comment: Corrected spelling of 'dxata' to 'data'. ```suggestion If we rely solely on semantic similarity matching, we are likely to only retrieve the first sentence to augment the large language model's answer. However, without information from sentences 2 and 3, and if the large language model lacks philosophy-related knowledge in its training data, it will be unable to correctly answer the question and might even "hallucinate." ``` ########## content/cn/blog/hugegraph-ai/agentic_graphrag.md: ########## @@ -0,0 +1,449 @@ +--- +date: 2025-10-29 Review Comment: The date '2025-10-29' is in the future. This should match the correct publication date and should be consistent with the English version. Verify the intended publication date. ```suggestion date: 2024-05-29 ``` ########## content/en/blog/hugegraph-ai/agentic_graphrag.md: ########## @@ -0,0 +1,452 @@ +--- +date: 2025-10-29 Review Comment: The date '2025-10-29' is in the future. If this is intended to be the publication date, it should likely be '2024-10-29' or the current actual date. Using a future date may cause issues with date-based sorting or filtering of blog posts. ```suggestion date: 2024-10-29 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
