SEZ9 opened a new issue, #8271: URL: https://github.com/apache/seatunnel/issues/8271
### Search before asking - [X] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description 1. Add support for Amazon Titan model in the embedding model_provider configuration; 2. Implement batch inference support in the embedding process, and send data to the model API in batches at one time; 3. Support successful detection of batch sending and perform fault tolerance. ### Usage Scenario In large-scale text vectorization and storage in vector databases, users need to vectorize text data efficiently and at low cost and store it in vector databases. For example: 1. User comment analysis scenario, it is necessary to transfer millions or tens of millions of rows of data at one time for vectorization. 2. Image retrieval scenario, users often have hundreds of thousands or millions of images vectorized into the database for subsequent vector approximation retrieval ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
