LaPetiteSouris opened a new pull request, #24554:
URL: https://github.com/apache/airflow/pull/24554

   ## What
   
   - Add `SqsBatchSensor` which polls SQS multiple times before returning the 
results
   
   
   
   ![Untitled Diagram 
drawio](https://user-images.githubusercontent.com/6369285/174552305-689b5c06-dc8d-4704-85cb-f0e34ad523fe.png)
   
   
   ## Why
   
   - SQS allows [10 messages per 
batch](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-batch-api-actions.html)
   - Current `SqsSensor` perform 1 poll per `poke`, which means effectively 
`SqsSensor` retrieves 10 messages per execution at max
   - In many cases, we may have hundred of messages stuck in the queue and 
`SqsSensor` dequeues only 10 messages per execution, thus to accelerate a bit, 
we need to add multiple `SqsSensor` tasks. `SqsSensor` tasks is a type of 
sensor that runs constantly (every minute for example), and consume worker 
execution slots.
   
   - In other cases, we have tasks that should be triggered with arguments 
retrieved for SQS. Example: the SQS messages contain IDs of items to be 
processed, and the next task will be triggered with arguments 
`--id_list=[id1,id2,id3,...]`. Having 10 messages per batch mean the downstream 
task has to be triggered multiple times, each time with only 10 ids max.
   
   - Similar to the mechanism provided by [AWS Lambda integration with 
SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html), we want to 
introduce the notion of batch in SQS, just like the way AWS Lambda processes 
SQS messages.
   
   Contributed thanks to efforts of  [DevDevEve](https://github.com/DevDevEve)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to