[I] [Bug] ollama.py: standardize retry logic and fix inconsistencies to match openai.py/litellm.py [hugegraph-ai]

via GitHub Sun, 21 Jun 2026 21:54:18 -0700


bitflicker64 opened a new issue, #365:
URL: https://github.com/apache/hugegraph-ai/issues/365


   ### Search before asking
   
   - [x] I had searched in the 
[feature](https://github.com/apache/hugegraph-ai/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Feature Description (功能描述)
   
   ## Summary
   
   `hugegraph-llm/src/hugegraph_llm/models/llms/ollama.py` uses the abandoned 
[`retry`](https://pypi.org/project/retry/) PyPI package and has several 
inconsistencies compared to the established patterns in `openai.py` and 
`litellm.py` in the same directory. This issue tracks bringing `ollama.py` in 
line with those patterns.
   
   ---
   
   ## Problems
   
   ### 1. Uses the abandoned `retry` package instead of `tenacity`
   
   `ollama.py` imports from the `retry` PyPI package, which has been [abandoned 
since 2016](https://github.com/invl/retry) and has no async support guarantees. 
`openai.py` and `litellm.py` in the same directory already use `tenacity`, the 
actively maintained fork.
   
   ```python
   # ollama.py — current
   from retry import retry
   
   # openai.py / litellm.py — established pattern
   from tenacity import (
       retry,
       retry_if_exception_type,
       stop_after_attempt,
       wait_exponential,
   )
   ```
   
   The `retry` package is also declared as an explicit dependency in 
`hugegraph-llm/pyproject.toml`. It should be removed and replaced with 
`tenacity` as an explicit dependency (currently `tenacity` is only pulled in 
transitively via `litellm`/`openai`).
   
   ### 2. No exception filtering — retries on any `Exception`
   
   The current decorator retries on **every** exception, including `KeyError` 
(from dict access on a malformed response) and `AssertionError` (from the 
`assert prompt is not None` guard that sits outside the `try` block). 
`openai.py` uses `retry_if_exception_type` to restrict retries to transient 
errors only.
   
   ```python
   # ollama.py — current: retries on anything
   @retry(tries=3, delay=1)
   
   # openai.py — retries only on transient errors
   @retry(
       stop=stop_after_attempt(3),
       wait=wait_exponential(multiplier=1, min=4, max=10),
       retry=retry_if_exception_type((RateLimitError, APIConnectionError, 
APITimeoutError)),
   )
   ```
   
   For `ollama.py`, the relevant exception types are `ollama.ResponseError` and 
network-level errors from `httpx` (`httpx.ConnectError`, 
`httpx.TimeoutException`, etc.).
   
   ### 3. Flat 1 s delay instead of exponential backoff
   
   `@retry(tries=3, delay=1)` waits exactly 1 second between every attempt. 
`openai.py` uses `wait_exponential(multiplier=1, min=4, max=10)`, which gives 
the server time to recover from transient overload before retrying.
   
   ### 4. `print()` used instead of `log.error()` for retry messages
   
   Both `generate` and `agenerate` use `print()` when logging a retry:
   
   ```python
   # ollama.py — current
   print(f"Retrying LLM call {e}")
   
   # openai.py — established pattern
   log.error("Retrying LLM call %s", e)
   ```
   
   The `log` module is already imported and used on the success path in 
`ollama.py` (`log.info("Token usage: ...")`), so this is an inconsistency 
within the file itself.
   
   ### 5. `agenerate_streaming` prints "Retrying LLM call" but has no `@retry` 
decorator
   
   `agenerate_streaming` has the same `print(f"Retrying LLM call {e}")` message 
in its `except` block, but no `@retry` decorator is applied. The message is 
misleading — no retry occurs.
   
   ```python
   # ollama.py
   except Exception as e:
       print(f"Retrying LLM call {e}")  # no retry actually happens
       raise e
   ```
   
   ### 6. Docstrings are placeholder text
   
   `generate` and `agenerate` both have `"""Comment"""` as their docstring. 
`openai.py` has real descriptions (e.g., `"""Generate a response to the query 
messages/prompt."""`).
   
   ---
   
   ## Proposed fix for the retry decorator (before/after)
   
   **Before (`ollama.py`):**
   ```python
   from retry import retry
   
   @retry(tries=3, delay=1)
   def generate(self, ...) -> str:
       """Comment"""
       ...
       except Exception as e:
           print(f"Retrying LLM call {e}")
           raise e
   
   @retry(tries=3, delay=1)
   async def agenerate(self, ...) -> str:
       """Comment"""
       ...
       except Exception as e:
           print(f"Retrying LLM call {e}")
           raise e
   ```
   
   **After (matching `openai.py` pattern):**
   ```python
   import httpx
   import ollama
   from tenacity import (
       retry,
       retry_if_exception_type,
       stop_after_attempt,
       wait_exponential,
   )
   
   @retry(
       stop=stop_after_attempt(3),
       wait=wait_exponential(multiplier=1, min=4, max=10),
       retry=retry_if_exception_type((ollama.ResponseError, httpx.ConnectError, 
httpx.TimeoutException)),
   )
   def generate(self, ...) -> str:
       """Generate a response to the query messages/prompt."""
       ...
       except (ollama.ResponseError, httpx.ConnectError, 
httpx.TimeoutException) as e:
           log.error("Retrying LLM call %s", e)
           raise e
   
   @retry(
       stop=stop_after_attempt(3),
       wait=wait_exponential(multiplier=1, min=4, max=10),
       retry=retry_if_exception_type((ollama.ResponseError, httpx.ConnectError, 
httpx.TimeoutException)),
   )
   async def agenerate(self, ...) -> str:
       """Generate a response to the query messages/prompt asynchronously."""
       ...
       except (ollama.ResponseError, httpx.ConnectError, 
httpx.TimeoutException) as e:
           log.error("Retrying LLM call %s", e)
           raise e
   ```
   
   `hugegraph-llm/pyproject.toml` should also be updated: remove `"retry"` and 
add `"tenacity"` as an explicit dependency. If `ollama.py` is the only consumer 
of the `retry` package, the `"retry~=0.9.2"` constraint in the root 
`pyproject.toml` can be removed as well.
   
   ---
   
   This was partially addressed in #143 which migrated openai.py — ollama.py 
was not included in that PR.
   
   ## Affected files
   
   - `hugegraph-llm/src/hugegraph_llm/models/llms/ollama.py`
   - `hugegraph-llm/pyproject.toml`
   - `pyproject.toml` (root, if `retry` constraint can be dropped)
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] ollama.py: standardize retry logic and fix inconsistencies to match openai.py/litellm.py [hugegraph-ai]

Reply via email to