bitflicker64 opened a new issue, #365: URL: https://github.com/apache/hugegraph-ai/issues/365
### Search before asking - [x] I had searched in the [feature](https://github.com/apache/hugegraph-ai/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Feature Description (功能描述) ## Summary `hugegraph-llm/src/hugegraph_llm/models/llms/ollama.py` uses the abandoned [`retry`](https://pypi.org/project/retry/) PyPI package and has several inconsistencies compared to the established patterns in `openai.py` and `litellm.py` in the same directory. This issue tracks bringing `ollama.py` in line with those patterns. --- ## Problems ### 1. Uses the abandoned `retry` package instead of `tenacity` `ollama.py` imports from the `retry` PyPI package, which has been [abandoned since 2016](https://github.com/invl/retry) and has no async support guarantees. `openai.py` and `litellm.py` in the same directory already use `tenacity`, the actively maintained fork. ```python # ollama.py — current from retry import retry # openai.py / litellm.py — established pattern from tenacity import ( retry, retry_if_exception_type, stop_after_attempt, wait_exponential, ) ``` The `retry` package is also declared as an explicit dependency in `hugegraph-llm/pyproject.toml`. It should be removed and replaced with `tenacity` as an explicit dependency (currently `tenacity` is only pulled in transitively via `litellm`/`openai`). ### 2. No exception filtering — retries on any `Exception` The current decorator retries on **every** exception, including `KeyError` (from dict access on a malformed response) and `AssertionError` (from the `assert prompt is not None` guard that sits outside the `try` block). `openai.py` uses `retry_if_exception_type` to restrict retries to transient errors only. ```python # ollama.py — current: retries on anything @retry(tries=3, delay=1) # openai.py — retries only on transient errors @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10), retry=retry_if_exception_type((RateLimitError, APIConnectionError, APITimeoutError)), ) ``` For `ollama.py`, the relevant exception types are `ollama.ResponseError` and network-level errors from `httpx` (`httpx.ConnectError`, `httpx.TimeoutException`, etc.). ### 3. Flat 1 s delay instead of exponential backoff `@retry(tries=3, delay=1)` waits exactly 1 second between every attempt. `openai.py` uses `wait_exponential(multiplier=1, min=4, max=10)`, which gives the server time to recover from transient overload before retrying. ### 4. `print()` used instead of `log.error()` for retry messages Both `generate` and `agenerate` use `print()` when logging a retry: ```python # ollama.py — current print(f"Retrying LLM call {e}") # openai.py — established pattern log.error("Retrying LLM call %s", e) ``` The `log` module is already imported and used on the success path in `ollama.py` (`log.info("Token usage: ...")`), so this is an inconsistency within the file itself. ### 5. `agenerate_streaming` prints "Retrying LLM call" but has no `@retry` decorator `agenerate_streaming` has the same `print(f"Retrying LLM call {e}")` message in its `except` block, but no `@retry` decorator is applied. The message is misleading — no retry occurs. ```python # ollama.py except Exception as e: print(f"Retrying LLM call {e}") # no retry actually happens raise e ``` ### 6. Docstrings are placeholder text `generate` and `agenerate` both have `"""Comment"""` as their docstring. `openai.py` has real descriptions (e.g., `"""Generate a response to the query messages/prompt."""`). --- ## Proposed fix for the retry decorator (before/after) **Before (`ollama.py`):** ```python from retry import retry @retry(tries=3, delay=1) def generate(self, ...) -> str: """Comment""" ... except Exception as e: print(f"Retrying LLM call {e}") raise e @retry(tries=3, delay=1) async def agenerate(self, ...) -> str: """Comment""" ... except Exception as e: print(f"Retrying LLM call {e}") raise e ``` **After (matching `openai.py` pattern):** ```python import httpx import ollama from tenacity import ( retry, retry_if_exception_type, stop_after_attempt, wait_exponential, ) @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10), retry=retry_if_exception_type((ollama.ResponseError, httpx.ConnectError, httpx.TimeoutException)), ) def generate(self, ...) -> str: """Generate a response to the query messages/prompt.""" ... except (ollama.ResponseError, httpx.ConnectError, httpx.TimeoutException) as e: log.error("Retrying LLM call %s", e) raise e @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10), retry=retry_if_exception_type((ollama.ResponseError, httpx.ConnectError, httpx.TimeoutException)), ) async def agenerate(self, ...) -> str: """Generate a response to the query messages/prompt asynchronously.""" ... except (ollama.ResponseError, httpx.ConnectError, httpx.TimeoutException) as e: log.error("Retrying LLM call %s", e) raise e ``` `hugegraph-llm/pyproject.toml` should also be updated: remove `"retry"` and add `"tenacity"` as an explicit dependency. If `ollama.py` is the only consumer of the `retry` package, the `"retry~=0.9.2"` constraint in the root `pyproject.toml` can be removed as well. --- This was partially addressed in #143 which migrated openai.py — ollama.py was not included in that PR. ## Affected files - `hugegraph-llm/src/hugegraph_llm/models/llms/ollama.py` - `hugegraph-llm/pyproject.toml` - `pyproject.toml` (root, if `retry` constraint can be dropped) ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
