uplsh580 opened a new issue, #61037:
URL: https://github.com/apache/airflow/issues/61037

   ### Description
   
   ## Summary
   
   Currently, the Helm chart processes all DAG bundles in a single DAG 
Processor deployment. This proposal adds an option to create separate 
deployments for each DAG bundle defined in `dagBundleConfigList`, enabling 
independent resource isolation and scaling per bundle.
   
   ## Motivation
   
   ### Problems
   
   1. **Resource Contention**: When multiple DAG bundles run in a single DAG 
Processor deployment, if one bundle consumes excessive CPU/memory, it impacts 
the parsing performance of other bundles.
   
   2. **Parsing Time Delays**: When a specific bundle has many or complex DAG 
files, parsing that bundle takes longer and can delay parsing of other bundles.
   
   3. **Scaling Limitations**: Even when bundles have different resource 
requirements or priorities, all bundles must currently be handled by the same 
deployment.
   
   4. **Lack of Failure Isolation**: Issues occurring in one bundle can affect 
the entire DAG Processor.
   
   ### Use Cases
   - Loading DAGs from multiple Git repositories (different bundle per 
repository)
   - When specific bundles require higher priority or more resources
   
   ## Proposed Solution
   
   ### Feature Overview
   
   Add a new `deployPerBundle` section to the `dagProcessor` configuration to 
create separate Kubernetes deployments for each bundle. When 
`deployPerBundle.enabled` is set to `true`, a separate deployment will be 
created for each bundle in `dagBundleConfigList`.
   
   ### Implementation Details
   
   #### 1. values.yaml Changes
   
   ```yaml
   dagProcessor:
     enabled: ~
   
     dagBundleConfigList:
       - name: dags-folder
         classpath: "airflow.dag_processing.bundles.local.LocalDagBundle"
         kwargs: {}
       # ... more bundles
     
     # Per-bundle deployment option
     # When enabled, creates a separate deployment for each bundle in 
dagBundleConfigList
     deployPerBundle:
       enabled: false
       # Command args template for per-bundle deployments
       # {{ bundleName }} will be replaced with the actual bundle name
       args: ["bash", "-c", "exec airflow dag-processor -B {{ bundleName }}"]
       
       # Per-bundle specific overrides (optional)
       bundleOverrides:
         dags-folder:
           replicas: 2
           resources:
             requests:
               memory: "2Gi"
               cpu: "1000m"
   ```
   
   ### Backward Compatibility
   
   - When `deployPerBundle.enabled: false` (default), existing behavior is 
maintained
   - Per-bundle deployments are only created when `deployPerBundle.enabled: 
true`
   - Existing `dagProcessor` settings (like `replicas`, `resources`, etc.) are 
used as defaults for per-bundle deployments
   - The `deployPerBundle.args` replaces the default `args` when 
`deployPerBundle.enabled` is `true`
   - Bundle-specific configuration overrides are possible via 
`deployPerBundle.bundleOverrides`
   
   ## Benefits
   
   1. **Resource Isolation**: Each bundle runs in independent pods, preventing 
resource contention
   2. **Independent Scaling**: Different replica counts can be set per bundle
   3. **Failure Isolation**: Issues in one bundle do not affect others
   4. **Flexible Resource Allocation**: Different resource requests/limits can 
be configured per bundle
   5. **Easier Monitoring**: Metrics and logs can be separated and tracked per 
bundle
   
   
   ### Use case/motivation
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to