dstandish commented on issue #16627:
URL: https://github.com/apache/airflow/issues/16627#issuecomment-873521269


   I did this for our internal repo and what I did **was refactor list_keys to 
call a list_objects method** so you could get the full objects and filter after:
   
   ```python
       @provide_bucket_name
       def list_objects(
           self,
           bucket_name: Optional[str] = None,
           prefix: Optional[str] = None,
           delimiter: Optional[str] = None,
           page_size: Optional[int] = None,
           max_items: Optional[int] = None,
           start_after_key: Optional[str] = None,
           start_after_time: Optional['DateTime'] = None,
       ) -> List[S3Object]:
           """
           Lists keys in a bucket under prefix and not containing delimiter
   
           Args:
               bucket_name: the name of the bucket
               prefix: a key prefix
               delimiter: the delimiter marks key hierarchy.
               page_size: pagination size
               max_items: maximum items to return
               start_after_key: should return only keys greater than this key
               start_after_time: should return only keys with LastModified attr 
greater than this time
   ```
   
   this lets you use either start after key (which is supported by 
list_objects_v2) or start after time (which is what you're after, and which 
requires that we list out every file in the prefix).
   
   and if people want to use other object info for filtering it would be easy 
to do.
   
   I think that might not be a bad way to go here.  
   
   then list keys somehting like this:
   
   ```python
       @provide_bucket_name
       def list_keys(
           self,
           bucket_name: Optional[str] = None,
           prefix: Optional[str] = None,
           delimiter: Optional[str] = None,
           page_size: Optional[int] = None,
           max_items: Optional[int] = None,
           start_after_key: Optional[str] = None,
           start_after_time: Optional['DateTime'] = None,
       ) -> list:
           objects = self.list_objects(
               bucket_name=bucket_name,
               prefix=prefix,
               delimiter=delimiter,
               page_size=page_size,
               max_items=max_items,
               start_after_key=start_after_key,
               start_after_time=start_after_time,
           )
           return [o.Key for o in objects]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to