pankajkoti commented on code in PR #36085:
URL: https://github.com/apache/airflow/pull/36085#discussion_r1417131009
##########
airflow/providers/weaviate/operators/weaviate.py:
##########
@@ -40,9 +44,11 @@ class WeaviateIngestOperator(BaseOperator):
:param conn_id: The Weaviate connection.
:param class_name: The Weaviate class to be used for storing the data
objects into.
+ :param input_data: The list of dicts or pandas dataframe representing
Weaviate data objects to generate
+ embeddings on (or provides custom vectors) and store them in the
Weaviate class.
:param input_json: The JSON representing Weaviate data objects to generate
embeddings on (or provides
Review Comment:
Can we update the docstring to mark this as deprecated?
##########
airflow/providers/weaviate/hooks/weaviate.py:
##########
@@ -135,22 +139,40 @@ def create_schema(self, schema_json: dict[str, Any]) ->
None:
client = self.conn
client.schema.create(schema_json)
+ @staticmethod
+ def check_http_error_should_retry(exc: BaseException):
+ return isinstance(exc, requests.HTTPError) and not exc.response.ok
+
def batch_data(
- self, class_name: str, data: list[dict[str, Any]],
batch_config_params: dict[str, Any] | None = None
+ self,
+ class_name: str,
+ data: list[dict[str, Any]] | pd.DataFrame,
+ batch_config_params: dict[str, Any] | None = None,
+ vector_col: str = "Vector",
+ no_retry_attempts_per_object: int = 5,
Review Comment:
```suggestion
retry_attempts_per_object: int = 5,
```
I am guessing you meant no=number, but looking at the first time I read it
as NO (some negation).
##########
airflow/providers/weaviate/hooks/weaviate.py:
##########
@@ -135,22 +139,40 @@ def create_schema(self, schema_json: dict[str, Any]) ->
None:
client = self.conn
client.schema.create(schema_json)
+ @staticmethod
+ def check_http_error_should_retry(exc: BaseException):
+ return isinstance(exc, requests.HTTPError) and not exc.response.ok
+
def batch_data(
- self, class_name: str, data: list[dict[str, Any]],
batch_config_params: dict[str, Any] | None = None
+ self,
+ class_name: str,
+ data: list[dict[str, Any]] | pd.DataFrame,
+ batch_config_params: dict[str, Any] | None = None,
+ vector_col: str = "Vector",
Review Comment:
I think it would make sense to add docstring for this public method of the
hook where we explain what these params mean.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]