norrishuang opened a new issue, #18070:
URL: https://github.com/apache/dolphinscheduler/issues/18070

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   DolphinScheduler currently supports Amazon EMR on EC2 task type for managing 
EC2-based clusters and running computing tasks. However, there is no support 
for [Amazon EMR 
Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html),
 which is a serverless deployment option that allows users to run Spark and 
Hive workloads without managing cluster infrastructure.
   EMR Serverless automatically provisions and scales compute resources on 
demand, offering a simpler operational model and cost optimization through 
pay-per-use pricing. It has become the recommended way to run EMR workloads for 
many use cases.
   This feature request proposes adding a new `EMR_SERVERLESS` task type that 
enables users to:
   1. **Submit jobs** — Submit Spark or Hive jobs to a pre-created EMR 
Serverless application via the [StartJobRun 
API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html)
   2. **Monitor job status** — Automatically poll job state (SUBMITTED → 
PENDING → SCHEDULED → RUNNING → SUCCESS/FAILED/CANCELLED) via the [GetJobRun 
API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_GetJobRun.html)
   3. **Cancel jobs** — Automatically cancel running jobs when a 
DolphinScheduler task is killed, via the [CancelJobRun 
API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_CancelJobRun.html)
   4. **Failover recovery** — Recover tracking of running jobs when Worker 
nodes restart
   ### Task Parameters
   | Parameter | Description |
   |-----------|-------------|
   | Application Id | EMR Serverless application ID (e.g. `00fkht2eodujab09`) |
   | Execution Role Arn | IAM role ARN for job execution |
   | Job Name | Optional job name for identification |
   | StartJobRunRequest JSON | JSON containing `JobDriver` and 
`ConfigurationOverrides` for the job |
   
   
   ### Use case
   
   - Data engineering teams running scheduled Spark ETL pipelines without 
managing EMR clusters
   - Ad-hoc Hive query workloads that benefit from serverless auto-scaling
   - Cost-sensitive environments where pay-per-use is preferred over always-on 
clusters
   - Organizations migrating from EMR on EC2 to EMR Serverless for operational 
simplicity
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to