[I] EMRServerlessStartJobOperator Expand/Expand Kwargs not Serializing properly [airflow]

via GitHub Fri, 08 Mar 2024 16:42:35 -0800


jliu0812 opened a new issue, #38005:
URL: https://github.com/apache/airflow/issues/38005


   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==8.19.0
   
   ### Apache Airflow version
   
   2.8.2
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Used breeze tool to deploy.
   
   ### What happened
   
   When using the `EmrServerlessStartJobOperator`, using the airflow expand 
functionality is not possible. The DAG will fail to serialize and it shows a 
DAG import error in the webserver. This is because 
`EmrServerlessStartJobOperator.operator_extra_links` is called and 
`EmrServerlessStartJobOperator` is of type `MappedOperator`, but 
`MappedOperator` is not a dictionary and not iterable.
   
   ### What you think should happen instead
   
   DAG should import successfully without any errors.
   
   ### How to reproduce
   
   The following single usage of EmrServerlessStartJobOperator works:
   
   ```python
   from datetime import datetime
   from airflow.models.dag import DAG
   from airflow.providers.amazon.aws.operators.emr import (
       EmrServerlessStartJobOperator,
   )
   
   DAG_ID = "example_emr_serverless"
   emr_serverless_app_id = "01234abcd"
   role_arn = "arn:test"
   
   
   with DAG(
       dag_id=DAG_ID,
       schedule="@once",
       start_date=datetime(2021, 1, 1),
       tags=["example"],
       catchup=False,
   ):
       start_job = EmrServerlessStartJobOperator(
           task_id="start_emr_serverless_job",
           application_id=emr_serverless_app_id,
           execution_role_arn=role_arn,
           job_driver={
               "sparkSubmit": {
                   "entryPoint": "test.jar",
                   "entryPointArguments": ["--arg", "1"],
                   "sparkSubmitParameters": "--conf sample",
               }
           },
           configuration_overrides={
               "monitoringConfiguration": {"s3MonitoringConfiguration": 
{"logUri": f"s3://test/logs"}}
           },
       )
   ```
   
   Whereas the following usage of expanded EmrServerlessStartJobOperator will 
fail to serialize:
   
   ```python
   from datetime import datetime
   from airflow.models.dag import DAG
   from airflow.providers.amazon.aws.operators.emr import (
       EmrServerlessStartJobOperator,
   )
   
   DAG_ID = "example_emr_serverless"
   emr_serverless_app_id = "01234abcd"
   role_arn = "arn:test"
   
   
   with DAG(
       dag_id=DAG_ID,
       schedule="@once",
       start_date=datetime(2021, 1, 1),
       tags=["example"],
       catchup=False,
   ):
       start_job = EmrServerlessStartJobOperator.partial(
           task_id="start_emr_serverless_job",
           application_id=emr_serverless_app_id,
           execution_role_arn=role_arn,
           configuration_overrides={
               "monitoringConfiguration": {"s3MonitoringConfiguration": 
{"logUri": f"s3://test/logs"}}
           },
       ).expand(
           job_driver=[{
               "sparkSubmit": {
                   "entryPoint": "test.jar",
                   "entryPointArguments": ["--arg", "1"],
                   "sparkSubmitParameters": "--conf sample",
               }
           },{
               "sparkSubmit": {
                   "entryPoint": "test.jar",
                   "entryPointArguments": ["--arg", "2"],
                   "sparkSubmitParameters": "--conf sample",
               }
           }]
       )
   
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] EMRServerlessStartJobOperator Expand/Expand Kwargs not Serializing properly [airflow]

Reply via email to