Re: [PR] feat(llm): (BREAKING CHANGE) update keyword extraction method [incubator-hugegraph-ai]

via GitHub Mon, 20 Oct 2025 04:52:18 -0700


imbajin commented on code in PR #282:
URL: 
https://github.com/apache/incubator-hugegraph-ai/pull/282#discussion_r2444791138



##########
hugegraph-llm/src/hugegraph_llm/config/llm_config.py:
##########
@@ -30,6 +30,10 @@ class LLMConfig(BaseConfig):
     text2gql_llm_type: Literal["openai", "litellm", "ollama/local"] = "openai"
     embedding_type: Optional[Literal["openai", "litellm", "ollama/local"]] = 
"openai"
     reranker_type: Optional[Literal["cohere", "siliconflow"]] = None
+    keyword_extract_type: Literal["llm", "textrank", "hybrid"] = "llm"
+    window_size: Optional[int] = 3
+    hybrid_llm_weights: Optional[float] = 0.5

Review Comment:
   ⚠️ **Important: Missing validation for hybrid_llm_weights**
   
   The config accepts `hybrid_llm_weights` but doesn't validate the 0.0-1.0 
range at initialization.
   
   **Recommendation:**
   Add Pydantic field validation:
   ```python
   from pydantic import Field
   
   hybrid_llm_weights: Optional[float] = Field(
       default=0.5,
       ge=0.0,
       le=1.0,
       description="LLM weight in hybrid mode (0.0-1.0)"
   )
   ```



##########
hugegraph-llm/src/hugegraph_llm/operators/common_op/nltk_helper.py:
##########
@@ -47,11 +50,64 @@ def stopwords(self, lang: str = "chinese") -> List[str]:
             try:
                 nltk.data.find("corpora/stopwords")
             except LookupError:
-                nltk.download("stopwords", download_dir=nltk_data_dir)
+                try:
+                    log.info("Start download nltk package stopwords")
+                    nltk.download("stopwords", download_dir=nltk_data_dir, 
quiet=False)
+                    log.debug("NLTK package stopwords is already downloaded")
+                except (URLError, HTTPError, PermissionError) as e:
+                    log.warning("Can't download package stopwords as error: 
%s", e)
+        try:
             self._stopwords[lang] = stopwords.words(lang)
+        except LookupError as e:
+            log.warning("NLTK stopwords for lang=%s not found: %s; using empty 
list", lang, e)
+            self._stopwords[lang] = []
+
+        # final check

Review Comment:
   ⚠️ **Code Quality: Duplicate NLTK download logic**
   
   The NLTK package download logic is duplicated between `stopwords()` and 
`check_nltk_data()` methods, violating DRY principle.
   
   **Recommendation:**
   Extract common download logic:
   ```python
   def _download_nltk_package(self, package: str, path: str, nltk_data_dir: 
str) -> bool:
       try:
           nltk.data.find(path)
           return True
       except LookupError:
           log.info("Downloading NLTK package: %s", package)
           try:
               return nltk.download(package, download_dir=nltk_data_dir, 
quiet=False)
           except (URLError, HTTPError, PermissionError) as e:
               log.warning("Failed to download %s: %s", package, e)
               return False
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(llm): (BREAKING CHANGE) update keyword extraction method [incubator-hugegraph-ai]

Reply via email to