Re: [PR] fix(llm): ollama batch embedding bug [incubator-hugegraph-ai]

via GitHub Fri, 23 May 2025 03:00:03 -0700


Copilot commented on code in PR #250:
URL: 
https://github.com/apache/incubator-hugegraph-ai/pull/250#discussion_r2104254237



##########
hugegraph-llm/src/hugegraph_llm/models/embeddings/ollama.py:
##########
@@ -19,57 +19,41 @@
 from typing import List
 
 import ollama
+
 from .base import BaseEmbedding
 
 
 class OllamaEmbedding(BaseEmbedding):
-    def __init__(
-            self,
-            model: str,
-            host: str = "127.0.0.1",
-            port: int = 11434,
-            **kwargs
-    ):
+    def __init__(self, model: str, host: str = "127.0.0.1", port: int = 11434, 
**kwargs):
         self.model = model
         self.client = ollama.Client(host=f"http://{host}:{port}";, **kwargs)
         self.async_client = ollama.AsyncClient(host=f"http://{host}:{port}";, 
**kwargs)
         self.embedding_dimension = None
 
-    def get_text_embedding(
-            self,
-            text: str
-    ) -> List[float]:
-        """Comment"""
-        return list(self.client.embed(model=self.model, 
input=text)["embeddings"][0])
+    def get_text_embedding(self, text: str) -> List[float]:
+        """Get embedding for a single text."""
+        return self.get_texts_embeddings([text])[0]
 
-    def get_texts_embeddings(
-            self,
-            texts: List[str]
-    ) -> List[List[float]]:
+    def get_texts_embeddings(self, texts: List[str]) -> List[List[float]]:
         """Get embeddings for multiple texts in a single batch.
-        
-        This method efficiently processes multiple texts at once by leveraging
-        Ollama's batching capabilities, which is more efficient than processing
-        texts individually.
-        
-        Parameters
-        ----------
-        texts : List[str]
-            A list of text strings to be embedded.
-            
+
         Returns
         -------
         List[List[float]]
             A list of embedding vectors, where each vector is a list of floats.
             The order of embeddings matches the order of input texts.
         """
-        response = self.client.embed(model=self.model, 
input=texts)["embeddings"]
-        return [list(inner_sequence) for inner_sequence in response]
-
-    async def async_get_text_embedding(
-            self,
-            text: str
-    ) -> List[float]:
-        """Comment"""
+        if hasattr(self.client, "embed"):

Review Comment:
   If `client.embed` is missing, the code raises an `AttributeError` instead of 
using a synchronous fallback (e.g., `client.embeddings` via `run_in_executor`). 
Implement a real sync fallback or document that batch embedding is unsupported 
by older client versions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(llm): ollama batch embedding bug [incubator-hugegraph-ai]

Reply via email to