LaPetiteSouris commented on code in PR #24554:
URL: https://github.com/apache/airflow/pull/24554#discussion_r902540438
##########
airflow/providers/amazon/aws/sensors/sqs.py:
##########
@@ -215,3 +226,65 @@ def __init__(self, *args, **kwargs):
stacklevel=2,
)
super().__init__(*args, **kwargs)
+
+
+class SqsBatchSensor(SqsSensor):
+ """
+ Get messages from an Amazon SQS queue in batches and then delete the
retrieved messages from the queue.
+ If deletion of messages fails an AirflowException is thrown. Otherwise,
all messages
+ are pushed through XCom with the key ``messages``.
+ The total number of messages retrieved at maxium will be equal to the
number of messages retrived for each
+ SQS's API call multiplies with total number of call. Each SQS
receive_message can get a max 10 messages.
+ This sensor is identical to SQSSensor, except the fact that SQSSensor
performs one and only one SQS call
+ per poke, while SQSBatchSensor performs multiple SQS API calls per poke.
+ .. seealso::
+ For more information on how to use this sensor, take a look at the
guide:
+ :ref:`howto/sensor:SqsBatchSensor`
+ :param batch: The number of time the sensor will call the SQS to receive
messages (default: 1)
Review Comment:
`num_batches` sound better IMHO, because in fact the `SqsBatchSensor`
performs exactly one poke anyway
##########
tests/system/providers/amazon/aws/example_sqs.py:
##########
@@ -66,6 +66,20 @@ def delete_queue(queue_url):
)
# [END howto_sensor_sqs]
+ # [START howto_sensor_sqs_batch]
+ # batch multiple messages from SQS.
+ # each SQS poll can retrieve no more than 10 messages
+ # due to requirements by AWS SQS
+ read_from_queue_in_batch = SqsBatchSensor(
+ task_id='read_from_queue_in_batch',
+ sqs_queue=create_queue,
+ # get maximum 10 messages each poll
+ max_messages=10,
+ # perform 3 polls before returning results
+ batch=3,
+ )
+ # [END howto_sensor_sqs_batch]
+
Review Comment:
wip
##########
airflow/providers/amazon/aws/sensors/sqs.py:
##########
@@ -136,15 +135,27 @@ def poke(self, context: 'Context'):
messages = self.filter_messages(messages)
num_messages = len(messages)
self.log.info("There are %d messages left after filtering",
num_messages)
+ return messages
+
+ def poke(self, context: 'Context'):
+ """
+ Check for message on subscribed queue and write to xcom the message
with key ``messages``
- if not num_messages:
+ :param context: the context object
+ :return: ``True`` if message is available or ``False``
+ """
+ sqs_conn = self.get_hook().get_conn()
Review Comment:
Indeed, I did not pay attention to that
##########
airflow/providers/amazon/aws/sensors/sqs.py:
##########
@@ -215,3 +226,65 @@ def __init__(self, *args, **kwargs):
stacklevel=2,
)
super().__init__(*args, **kwargs)
+
+
+class SqsBatchSensor(SqsSensor):
+ """
+ Get messages from an Amazon SQS queue in batches and then delete the
retrieved messages from the queue.
+ If deletion of messages fails an AirflowException is thrown. Otherwise,
all messages
+ are pushed through XCom with the key ``messages``.
+ The total number of messages retrieved at maxium will be equal to the
number of messages retrived for each
+ SQS's API call multiplies with total number of call. Each SQS
receive_message can get a max 10 messages.
+ This sensor is identical to SQSSensor, except the fact that SQSSensor
performs one and only one SQS call
+ per poke, while SQSBatchSensor performs multiple SQS API calls per poke.
+ .. seealso::
+ For more information on how to use this sensor, take a look at the
guide:
+ :ref:`howto/sensor:SqsBatchSensor`
+ :param batch: The number of time the sensor will call the SQS to receive
messages (default: 1)
+ """
+
+ def __init__(
+ self,
+ *,
+ batch: int = 1,
+ **kwargs,
+ ):
+ super().__init__(**kwargs)
+ self.batch = batch
+
+ def poke(self, context: 'Context'):
+ """
+ Check for message on subscribed queue and write to xcom the message
with key ``messages``
+ :param context: the context object
+ :return: ``True`` if message is available or ``False``
+ """
+ sqs_conn = self.get_hook().get_conn()
+ message_batch = []
+ # perform multiple SQS call to retrieve messages in series
+ for _ in range(self.batch):
Review Comment:
WIP
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]