vafremov213 opened a new issue, #60836:
URL: https://github.com/apache/airflow/issues/60836

   ### Description
   
   Add an optional **'max_mails'** parameter to ImapHook attachement methods 
(download_mail_attachments etc.) that allows limiting the number of latest 
emails being processed, while still retrieving all attachments
   from those emails.
   
   ### Use case/motivation
   
   Problem
   
   ImapHook currently has only two options for processing attachments from 
emails matching the given
   mail_filter. 
   
   For example:
   - the last 3 emails match the filter
   - each email contains 2 attachments
   
   Currently, users must either:
   - download all 6 attachments, or
   - use `latest_only=True` and receive only a single attachment
   
   There is no way to get only latest email with more than 1 attachment.
   There is no way to retrieve all attachments from the last N emails.
   There is no way to limit the number of emails being processed other than 
latetest_only flag.
   
   This becomes problematic when working with emails that contain multiple 
attachments.
   
   Proposed solution
   
   Introduce an optional **max_mails** parameter to `ImapHook` methods that 
retrieve
   messages or attachments.
   
   The parameter would limit the number of **latest emails** being processed,
   while still returning **all attachments** from those emails.
   
   The default value would be `None`, preserving the current behavior.
   
   Conceptually, this can be implemented by modifying the internal 
_list_mail_ids_desc method as follows:
   
   ```
   
   import itertools
   
   def _list_mail_ids_desc(
       self,
       mail_filter: str,
       max_mails: int | None = None,
   ) -> Iterable[str]:
       if not self.mail_client:
           raise RuntimeError("The 'mail_client' should be initialized before!")
   
       _, data = self.mail_client.search(None, mail_filter)
       mail_ids = data[0].split()
   
       mail_ids_desc = reversed(mail_ids)
   
       if max_mails is not None:
           return itertools.islice(mail_ids_desc, max_mails)
   
       return mail_ids_desc
   ```
   
   Of cource this parameter shoud aslo be added to attachemnt methods.
   
   Example use case
   
   ```python
   hook.retrieve_mail_attachments(
       mail_filter="SINCE 01-Jan-2024",
       max_mails=2,
   )
   
   This would retrieve all attachments from the two most recent matching emails.
   
   
   
   ### Related issues
   
   Haven't found any
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to