ylnsnv opened a new pull request, #35547:
URL: https://github.com/apache/airflow/pull/35547

   ## Summary
   
   This PR refines the `OpenAIEmbeddingOperator` to accurately reflect and 
support the extended range of input types it can handle for generating 
embeddings ([by OpenAI official 
API](https://github.com/apache/airflow/assets/42522942/b15eef98-dcb9-42a4-b828-fba8fa9a47d1)).
 The operator, which previously accepted a string or a list of any type, now 
has improved type annotations and validation to handle strings, lists of 
strings, lists of integers, and lists of lists of integers more explicitly.
   
   ## Changes
   
   - Modified type annotations `str | list[Any]` to be [`str | list[str] | 
list[int] | 
list[list[int]]`](https://github.com/apache/airflow/assets/42522942/b15eef98-dcb9-42a4-b828-fba8fa9a47d1)
   - Refined input validation to ensure `input_text` is non-empty and matches 
one of the expected types.
   - Expanded unit tests to cover the newly supported input types and to ensure 
the operator behaves as expected.
   
   ## Impact
   
   The operator's enhanced input type support provides users with clearer 
guidance on what data can be passed for embedding generation. This update 
facilitates the use of the operator in a broader range of scenarios, such as 
processing numerical data or sequences of data, which is common in machine 
learning and NLP tasks.
   
   ## Tests
   
   - Extended the test suite to include cases for each of the supported input 
types, ensuring robustness.
   - Added negative test cases to verify that the operator raises exceptions 
for invalid inputs as expected.
   
   With these changes, the `OpenAIEmbeddingOperator` becomes more intuitive and 
versatile, allowing for seamless integration into various data processing 
pipelines within Airflow environments.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to