ylnsnv opened a new pull request, #35547: URL: https://github.com/apache/airflow/pull/35547
## Summary This PR refines the `OpenAIEmbeddingOperator` to accurately reflect and support the extended range of input types it can handle for generating embeddings ([by OpenAI official API](https://github.com/apache/airflow/assets/42522942/b15eef98-dcb9-42a4-b828-fba8fa9a47d1)). The operator, which previously accepted a string or a list of any type, now has improved type annotations and validation to handle strings, lists of strings, lists of integers, and lists of lists of integers more explicitly. ## Changes - Modified type annotations `str | list[Any]` to be [`str | list[str] | list[int] | list[list[int]]`](https://github.com/apache/airflow/assets/42522942/b15eef98-dcb9-42a4-b828-fba8fa9a47d1) - Refined input validation to ensure `input_text` is non-empty and matches one of the expected types. - Expanded unit tests to cover the newly supported input types and to ensure the operator behaves as expected. ## Impact The operator's enhanced input type support provides users with clearer guidance on what data can be passed for embedding generation. This update facilitates the use of the operator in a broader range of scenarios, such as processing numerical data or sequences of data, which is common in machine learning and NLP tasks. ## Tests - Extended the test suite to include cases for each of the supported input types, ensuring robustness. - Added negative test cases to verify that the operator raises exceptions for invalid inputs as expected. With these changes, the `OpenAIEmbeddingOperator` becomes more intuitive and versatile, allowing for seamless integration into various data processing pipelines within Airflow environments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
