SEZ9 opened a new issue, #16497: URL: https://github.com/apache/dolphinscheduler/issues/16497
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement. ### Description For large language model (LLM) hosting services, DolphinScheduler's data scheduling tasks present two key opportunities: 1. Integrating services like Amazon Bedrock as a task plugin, where Bedrock can support fine-tuning of LLM models. After upstream data is orchestrated and processed, it can directly invoke Bedrock for fine-tuning and model evaluation, such as in scenarios involving fine-tuning models like LLaMA 3 or Claude 3. 2. Leveraging LLMs to process multimodal capabilities, handling unstructured data like images and text, extracting and structuring the output for downstream processing. ### Use case 1. Fine-Tuning LLMs with Amazon Bedrock Use Case: Automate the fine-tuning process of large language models (e.g., LLaMA 3, Claude 3) using Amazon Bedrock. Scenario: After processing and orchestrating upstream data, DolphinScheduler triggers a task that uses Bedrock's fine-tuning service. The task fine-tunes the LLM based on specific datasets and performs model evaluation, all within a seamless workflow. 2. Multimodal Data Processing Use Case: Process and structure unstructured multimodal data using LLMs. Scenario: DolphinScheduler integrates LLMs to handle unstructured data, such as images, text, and videos. The LLM processes this data, extracting meaningful information and converting it into structured formats for downstream applications, such as databases or analytical tools. 3. Automated Content Moderation Use Case: Implement content moderation workflows using LLMs to analyze and filter content. Scenario: Content from various sources (text, images, videos) is scheduled for moderation tasks. DolphinScheduler orchestrates these tasks, where LLMs are used to analyze the content, detect inappropriate material, and flag or remove it according to predefined rules. 4. Real-Time Data Enrichment Use Case: Enhance real-time data streams with contextual information using LLMs. Scenario: DolphinScheduler orchestrates data streams from IoT devices, social media, or other sources. LLMs are used to enrich this data with additional context, such as sentiment analysis or object recognition, before forwarding it to real-time analytics systems. 5. Automated Document Processing Use Case: Streamline the processing of large volumes of documents using LLMs. Scenario: Documents such as contracts, reports, or emails are ingested by DolphinScheduler and passed to LLMs for processing. The LLMs extract key information, summarize content, and categorize documents, automating tasks like compliance checks or data entry. ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
