norrishuang opened a new issue, #18070: URL: https://github.com/apache/dolphinscheduler/issues/18070
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement. ### Description DolphinScheduler currently supports Amazon EMR on EC2 task type for managing EC2-based clusters and running computing tasks. However, there is no support for [Amazon EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html), which is a serverless deployment option that allows users to run Spark and Hive workloads without managing cluster infrastructure. EMR Serverless automatically provisions and scales compute resources on demand, offering a simpler operational model and cost optimization through pay-per-use pricing. It has become the recommended way to run EMR workloads for many use cases. This feature request proposes adding a new `EMR_SERVERLESS` task type that enables users to: 1. **Submit jobs** — Submit Spark or Hive jobs to a pre-created EMR Serverless application via the [StartJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html) 2. **Monitor job status** — Automatically poll job state (SUBMITTED → PENDING → SCHEDULED → RUNNING → SUCCESS/FAILED/CANCELLED) via the [GetJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_GetJobRun.html) 3. **Cancel jobs** — Automatically cancel running jobs when a DolphinScheduler task is killed, via the [CancelJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_CancelJobRun.html) 4. **Failover recovery** — Recover tracking of running jobs when Worker nodes restart ### Task Parameters | Parameter | Description | |-----------|-------------| | Application Id | EMR Serverless application ID (e.g. `00fkht2eodujab09`) | | Execution Role Arn | IAM role ARN for job execution | | Job Name | Optional job name for identification | | StartJobRunRequest JSON | JSON containing `JobDriver` and `ConfigurationOverrides` for the job | ### Use case - Data engineering teams running scheduled Spark ETL pipelines without managing EMR clusters - Ad-hoc Hive query workloads that benefit from serverless auto-scaling - Cost-sensitive environments where pay-per-use is preferred over always-on clusters - Organizations migrating from EMR on EC2 to EMR Serverless for operational simplicity ### Related issues _No response_ ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
