hsrocks opened a new issue #22037:
URL: https://github.com/apache/airflow/issues/22037


   ### Description
   
   We can use the DataBrew integration to add data cleaning and data 
Normalization steps into our analytics and machine learning workflows. The 
operator will be used to trigger StartJobRun API 
[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/databrew.html#GlueDataBrew.Client.start_job_run](url)
 in order to start the job run. Also we will provide an option to wait for 
completion like we did for other available operator in case someone wants to 
wait for completion before triggering next task
   
   ### Use case/motivation
   
   AWS Glue DataBrew is a visual data preparation tool that enables users to 
clean and normalize data without writing any code. With the help of this API 
once the Glue DataBrew project is setup for ML or analytics engineer . This API 
can add value for the use case like we have to normalise or clean data before 
triggering Sagemaker Training or inferencing job or once the cleaned data is 
present we want to do validation of results using Glue or Athena
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to