[I] [Feature][Task Plugin] Can DolphinScheduler support the integration of TASK components for hosting large language models (LLM), thereby bringing more LLM-related capabilities to DolphinScheduler [dolphinscheduler]

via GitHub Tue, 20 Aug 2024 21:34:30 -0700


SEZ9 opened a new issue, #16497:
URL: https://github.com/apache/dolphinscheduler/issues/16497


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   For large language model (LLM) hosting services, DolphinScheduler's data 
scheduling tasks present two key opportunities:
   
   1. Integrating services like Amazon Bedrock as a task plugin, where Bedrock 
can support fine-tuning of LLM models. After upstream data is orchestrated and 
processed, it can directly invoke Bedrock for fine-tuning and model evaluation, 
such as in scenarios involving fine-tuning models like LLaMA 3 or Claude 3.
   
   2. Leveraging LLMs to process multimodal capabilities, handling unstructured 
data like images and text, extracting and structuring the output for downstream 
processing.
   
   ### Use case
   
   1. Fine-Tuning LLMs with Amazon Bedrock
   Use Case: Automate the fine-tuning process of large language models (e.g., 
LLaMA 3, Claude 3) using Amazon Bedrock.
   Scenario: After processing and orchestrating upstream data, DolphinScheduler 
triggers a task that uses Bedrock's fine-tuning service. The task fine-tunes 
the LLM based on specific datasets and performs model evaluation, all within a 
seamless workflow.
   2. Multimodal Data Processing
   Use Case: Process and structure unstructured multimodal data using LLMs.
   Scenario: DolphinScheduler integrates LLMs to handle unstructured data, such 
as images, text, and videos. The LLM processes this data, extracting meaningful 
information and converting it into structured formats for downstream 
applications, such as databases or analytical tools.
   3. Automated Content Moderation
   Use Case: Implement content moderation workflows using LLMs to analyze and 
filter content.
   Scenario: Content from various sources (text, images, videos) is scheduled 
for moderation tasks. DolphinScheduler orchestrates these tasks, where LLMs are 
used to analyze the content, detect inappropriate material, and flag or remove 
it according to predefined rules.
   4. Real-Time Data Enrichment
   Use Case: Enhance real-time data streams with contextual information using 
LLMs.
   Scenario: DolphinScheduler orchestrates data streams from IoT devices, 
social media, or other sources. LLMs are used to enrich this data with 
additional context, such as sentiment analysis or object recognition, before 
forwarding it to real-time analytics systems.
   5. Automated Document Processing
   Use Case: Streamline the processing of large volumes of documents using LLMs.
   Scenario: Documents such as contracts, reports, or emails are ingested by 
DolphinScheduler and passed to LLMs for processing. The LLMs extract key 
information, summarize content, and categorize documents, automating tasks like 
compliance checks or data entry.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature][Task Plugin] Can DolphinScheduler support the integration of TASK components for hosting large language models (LLM), thereby bringing more LLM-related capabilities to DolphinScheduler [dolphinscheduler]

Reply via email to